巴西专利BR112015018981B1 Method, apparatus and non-transient media for signal decorrelation in an audio processing system

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
SIGNAL DECORRELATION IN AN AUDIO PROCESSING SYSTEM. The present invention relates to audio processing methods which may involve receiving audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filter bank coefficients of an audio encoding or processing system. A decorrelation process can be performed with the same filter bank coefficients used by the encoding or audio processing system. The decorrelation process can be performed without converting coefficients from the frequency domain representation to another frequency domain or time domain representation. The decorrelation process may involve selective or signal adaptive decorrelation of specific channels and/or specific frequency bands. The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data in order to produce filtered audio data. The decorrelation process may involve using a non-hierarchical mixer to combine a direct portion of the received audio data with the audio data filtered according to spatial parameters.
公开号:BR112015018981B1
申请号:R112015018981-4
申请日:2014-01-22
公开日:2022-02-01
发明作者:Vinay Melkote；Kuan-Chieh Yen；Grant A. Davidson；Matthew Fellers；Mark S. Vinton；Vivek Kumar
申请人:Dolby Laboratories Licensing Corporation；
IPC主号:

专利说明:

FIELD OF TECHNIQUE
[001] This disclosure pertains to signal processing. BACKGROUND
[002] The development of digital encoding and decoding processes for audio and video data continues to have a significant effect on the delivery of entertainment content. Despite the increasing capacity of widely available memory and data delivery devices at ever-increasing bandwidths, there is ongoing pressure to minimize the amount of data to be stored and/or transmitted. Audio and video data are delivered together many times, and the bandwidth for audio data is often limited by the requirements of the video portion.
[003] Consequently, audio data is often encoded at high compression factors, sometimes at compression factors of 30:1 or higher. Because signal distortion increases with the amount of compression applied, trade-offs can be made between the fidelity of the decoded audio data and the storage and/or transmission efficiency of the encoded data.
[004] Furthermore, it is desirable to reduce the complexity of encoding and decoding algorithms. Additional encoding data relating to the encoding process may simplify the decoding process, but at the cost of storing and/or transmitting additional encoded data. While audio encoding and decoding methods are generally satisfactory, improved methods would be desirable. SUMMARY
[005] Some aspects of the matter described in this disclosure can be implemented in audio processing methods. Some of these methods may involve receiving audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filter bank coefficients of an audio encoding or processing system. The method may involve applying a decorrelation process to at least some of the audio data. In some deployments, the decorrelation process can be performed with the same filter bank coefficients used by the audio encoding or processing system.
[006] In some deployments, the decorrelation process can be performed without converting coefficients from the frequency domain representation into another frequency domain or time domain representation. The frequency domain representation can be the result of applying a critically sampled perfect reconstruction filter bank. The decorrelation process may involve generating reverberation signals or decorrelation signals by applying linear filters to at least a portion of the frequency domain representation. The frequency domain representation can be a result of applying a modified discrete sine transform, a modified discrete cosine transform, or an orthogonal transform superimposed on audio data in a time domain. The decorrelation process may involve the application of a decorrelation algorithm that operates entirely on real evaluated coefficients.
[007] According to some implementations, the decorrelation process may involve selective or signal adaptive decorrelation of specific channels. Alternatively or additionally, the decorrelation process may involve selective or signal adaptive decorrelation of specific frequency bands. The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data to produce filtered audio data. The decorrelation process may involve using a non-hierarchical mixer to combine a direct portion of the received audio data with the filtered audio data according to spatial parameters.
[008] In some deployments, decorrelation information may be received, either with the audio data or otherwise. The decorrelation process may involve decorrelating at least some of the audio data according to the received decorrelation information. Decorrelation information received may include correlation coefficients between individual discrete channels and a coupling channel, correlation coefficients between individual discrete channels, explicit pitch information, and/or transient information.
[009] The method may involve determining decorrelation information based on received audio data. The de-correlation process may involve de-correlating at least some of the audio data according to certain de-correlation information. The method may involve receiving decorrelation information encoded with the audio data. The de-correlation process may involve de-correlating at least some of the audio data according to at least one of the received de-correlation information or one of the determined de-correlation information.
[0010] According to some implementations, the audio encoding or processing system may be a legacy audio encoding or processing system. The method may involve receiving control engine elements in a bit stream produced by the legacy audio encoding or processing system. The decorrelation process may be based, at least in part, on control mechanism elements.
[0011] In some implementations, a device may include an interface and a logic system configured to receive, through the interface, audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filter bank coefficients of an audio encoding or processing system. The logic system can be configured to apply an uncorrelated process to at least some of the audio data. In some deployments, the decorrelation process can be performed with the same filter bank coefficients used by the audio encoding or processing system. The logic system may include at least one of a single-purpose processor or a multi-chip processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components.
[0012] In some deployments, the decorrelation process can be performed without converting coefficients from the frequency domain representation to another frequency domain or time domain representation. The frequency domain representation can be the result of applying a critically sampled filter bank. The decorrelation process may involve generating reverberation signals or decorrelation signals by applying linear filters to a portion of the lower frequency domain representation. The frequency domain representation can be the result of applying a modified discrete sine transform, a modified discrete cosine transform, or an orthogonal transform superimposed on audio data in a time domain. The decorrelation process may involve the application of a decorrelation algorithm that operates entirely on real evaluated coefficients.
[0013] The decorrelation process may involve selective or signal-adaptive decorrelation of specific channels. The decorrelation process may involve selective or signal adaptive decorrelation of specific frequency bands. The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data to produce filtered audio data. In some deployments, the decorrelation process may involve using a non-hierarchical mixer to combine the portion of the received audio data with the filtered audio data according to spatial parameters.
[0014] The apparatus may include a memory device. In some deployments, the interface may be an interface between the logical system and the memory device. Alternatively, the interface may be a network interface.
[0015] The audio encoding or processing system may be a legacy audio encoding or processing system. In some deployments, the logic system may be additionally configured to receive, through the interface, control engine elements in a bit stream produced by the legacy audio encoding or processing system. The decorrelation process may be based, at least in part, on control mechanism elements.
[0016] Some aspects of this disclosure can be deployed on a non-transient media that has software stored on it. The software may include instructions for controlling an apparatus to receive audio data corresponding to a plurality of audio channels. The audio data may include a frequency domain representation corresponding to filter bank coefficients of an audio encoding or processing system. The software may include instructions for controlling the player to apply a decorrelation process to at least some of the audio data. In some deployments, the decorrelation process is performed with the same filter bank coefficients used by the audio encoding or processing system.
[0017] In some deployments, the decorrelation process can be performed without converting coefficients from the frequency domain representation to another frequency domain or time domain representation. The frequency domain representation can be the result of applying a critically sampled filter bank. The decorrelation process may involve generating reverberation signals or decorrelation signals by applying linear filters to a portion of the lower frequency domain representation. The frequency domain representation can be a result of applying a modified discrete sine transform, a modified discrete cosine transform, or an orthogonal transform superimposed on audio data in a time domain. The decorrelation process may involve the application of a decorrelation algorithm that operates entirely on real evaluated coefficients.
[0018] Some methods may involve receiving audio data corresponding to a plurality of audio channels and determining audio characteristics of the audio data. Audio characteristics may include transient information. The methods may involve determining an amount of decorrelation for the audio data based, at least in part, on the audio characteristics and processing the audio data according to a determined amount of decorrelation.
[0019] In some cases, no explicit transient information can be received with the audio data. In some deployments, the process of determining transient information may involve detecting a mild transient event.
[0020] The process of determining transient information may involve the assessment of a probability and/or a severity of a transient event. The process of determining transient information may involve evaluating a temporal power variation in the audio data.
[0021] The process of determining audio characteristics may involve receiving explicit transient information with the audio data. Explicit transient information can include at least one of a transient control value corresponding to a defined transient event, a transient control value corresponding to a defined non-transient event, or an intermediate transient control value. Explicit transient information can include an intermediate transient control value or a transient control value corresponding to a defined transient event. The transient control value can be subjected to an exponential decay function.
[0022] Explicit transient information may indicate a defined transient event. Processing the audio data may involve temporarily stopping or slowing down a de-correlation process. Explicit transient information can include a transient control value corresponding to a defined non-transient event or an intermediate transient value. The process of determining transient information may involve detecting a smooth transient event. The process of detecting a mild transient event may involve evaluating at least one of a probability or a severity of a transient event.
[0023] The determined transient information can be a determined transient control value corresponding to the soft transient event. The method may involve combining the determined transient control value with the received transient control value to obtain a new transient control value. The process of combining the determined transient control value and the received transient control value may involve determining the determined maximum transient control value and the received transient control maximum value.
[0024] The process of detecting a smooth transient event may involve detecting a temporal power variation of the audio data. The detection of temporal power variation may involve determining a variation in a logarithmic power average. The logarithmic power average may be a frequency band weighted logarithmic average. Determining the change in logarithmic power mean may involve determining a time-skewed power differential. Asymmetrical power differential can accentuate increasing power and can slow down decreasing power. The method may involve determining a raw transient measurement based on the asymmetric power differential. Determining the raw transient measure may involve computing a transient event probability function s based on an assumption that the skewed temporal power differential is distributed according to a Gaussian distribution. The method may involve determining a transient control value based on the raw transient measurement. The method may involve applying an exponential decay function to the transient control value.
[0025] Some methods may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data and mixing the filtered audio data with a portion of the received audio data according to a mix ratio. The process of determining the amount of decorrelation may involve modifying the mix ratio based, at least in part, on the transient control value.
[0026] Some methods may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data. Determining the amount of decorrelation for the audio data may involve smoothing an input to the decorrelation filter based on transient information. The process of determining an amount of decorrelation for the audio data may involve reducing an amount of decorrelation in response to the detection of a smooth transient event.
[0027] Processing the audio data may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data and mixing the filtered audio data with a portion of the received audio data according to a mixing ratio. The process of reducing the amount of decorrelation may involve modifying the mix ratio.
[0028] Processing the audio data may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data, gauging a gain to apply to the filtered audio data, applying the gain to the data audio data and mixing the filtered audio data with a portion of the received audio data.
[0029] The calibration process may involve matching a power of filtered audio data with a power of received audio data. In some implementations, the process of measuring and applying the gain can be performed by a bank of signal level compressors. The bank of signal level compressors (duckers) can include temporary stores. A fixed delay can be applied to filtered audio data, and the same delay can be applied to temporary stores.
[0030] At least one of a smoothing window of power estimation for the signal level compressors or the gain to be applied to filtered audio data may be based, at least in part, on given transient information. In some deployments, a shorter smoothing window may be applied when a transient event is relatively more likely or a relatively stronger transient event is detected, and a longer smoothing window may be applied when a transient event is relatively less likely, a relatively less intense transient event is detected or no transient event is detected.
[0031] Some methods may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data, gauging a signal level compressor (ducker) gain to be applied to the filtered audio data, applying from the signal level compressor gain to the filtered audio data and mixing the filtered audio data with a portion of the received audio data according to a mix ratio. The process of determining the amount of decorrelation may involve modifying the mix ratio based on at least one of the transient information or the signal level compressor gain.
[0032] The process of determining the audio characteristics may involve the determination of at least one of a channel being switched per block, an out-of-couple channel or an out-of-use channel coupling. Determining an amount of decorrelation for the audio data may involve determining that a decorrelation process should not be reduced or temporarily stopped.
[0033] Processing the audio data may involve a decorrelation filter dithering process. The method may involve determining that the decorrelation filter dithering process should be modified or temporarily stopped based, at least in part, on transient information. According to some methods, it can be determined that the decorrelation filter dithering process will be modified by changing a maximum step value for dithering poles of the decorrelation filter.
[0034] According to some implementations, a device may include an interface and a logic system. The logic system may be configured to receive audio data corresponding to a plurality of audio channels from the interface and determine audio characteristics of the audio data. Audio characteristics may include transient information. The logic system can be configured to determine an amount of decorrelation for the audio data based at least in part on the audio characteristics and process the audio data according to a determined amount of decorrelation.
[0035] In some deployments, no explicit transient information can be received with the audio data. The process of determining transient information may involve detecting a smooth transient event. The process of determining transient information may involve evaluating at least one of a probability or a severity of a transient event. The process of determining transient information may involve evaluating a temporal power variation in the audio data.
[0036] In some deployments, determining audio characteristics may involve receiving explicit transient information with the audio data. Explicit transient information can indicate at least one of a transient control value corresponding to a defined transient event, a transient control value corresponding to a defined non-transient event, or an intermediate transient control value. Explicit transient information can include an intermediate transient control value or a transient control value corresponding to a defined transient event. The transient control value can be subjected to an exponential decay function.
[0037] If the explicit transient information indicates a defined transient event, processing the audio data may involve temporarily reducing or stopping a decorrelation process. If the explicit transient information includes a transient control value corresponding to a defined non-transient event or an intermediate transient value, the transient information determination process may involve the detection of a smooth transient event. The given transient information can be a given transient control value corresponding to the smooth transient event.
[0038] The logic system can be further configured to combine the determined transient control value with the received transient control value to obtain a new transient control value. In some implementations, the process of combining the given transient control value and the received transient control value may involve determining the given maximum transient control value and the maximum received transient control value.
[0039] The process of detecting a mild transient event may involve the evaluation of at least one of a probability or a severity of a transient event. The process of detecting a smooth transient event may involve detecting a temporal power variation of the audio data.
[0040] In some deployments, the logic system can be further configured to apply a decorrelation filter to a portion of the audio data to produce filtered audio data and mix the filtered audio data with a portion of the received audio data accordingly. with a mixing ratio. The process of determining the amount of decorrelation may involve modifying the mix ratio based, at least in part, on transient information.
[0041] The process of determining an amount of decorrelation for the audio data may involve reducing an amount of decorrelation in response to the detection of the smooth transient event. Processing the audio data may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data and mixing the filtered audio data with a portion of the received audio data according to a mix ratio. . The process of reducing the amount of decorrelation may involve modifying the mix ratio.
[0042] Processing the audio data may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data, gauging a gain to apply to the filtered audio data, applying the gain to filtered audio data and mixing the filtered audio data with a portion of the received audio data. The gauging process may involve matching a power of filtered audio data to a power of received audio data. The logic system may include a bank of signal level compressors configured to perform the measurement and gain application processes.
[0043] Some aspects of the present disclosure may be deployed on a non-transient medium that has software stored on it. The software may include instructions for controlling an apparatus to receive audio data corresponding to a plurality of audio channels and to determine the audio characteristics of the audio data. In some deployments, the audio characteristics may include transient information. The software may include instructions for controlling an apparatus to determine an amount of decorrelation for the audio data based, at least in part, on the audio characteristics and to process the audio data in accordance with a determined amount of decorrelation.
[0044] In some cases, no explicit transient information can be received with the audio data. The process of determining transient information may involve detecting a smooth transient event. The process of determining transient information may involve evaluating at least one of a probability or a severity of a transient event. The process of determining transient information may involve evaluating a temporal power variation in the audio data.
[0045] However, in some deployments, determining audio characteristics may involve receiving explicit transient information with the audio data. Explicit transient information can include a transient control value corresponding to a defined transient event, a transient control value corresponding to a defined non-transient event, and/or an intermediate transient control value. If the explicit transient information indicates a transient event, processing the audio data may involve temporarily stopping or slowing a decorrelation process.
[0046] If the explicit transient information includes a transient control value corresponding to a defined non-transient event or an intermediate transient value, the transient information determination process may involve the detection of a smooth transient event. The given transient information can be a given transient control value corresponding to the smooth transient event. The process of determining transient information may involve combining the determined transient control value with the received transient control value to obtain a new transient control value. The process of combining the given transient control value and the received transient control value may involve determining the given maximum transient control value and the received maximum transient control value.
[0047] The process of detecting a mild transient event may involve the evaluation of at least one of a probability or a severity of a transient event. The process of detecting a smooth transient event may involve detecting a temporal power variation of the audio data.
[0048] The software may include instructions for controlling the apparatus to apply a decorrelation filter to a portion of the audio data to produce filtered audio data and to mix the filtered audio data with a portion of the received audio data in accordance with a mixing ratio. The process of determining the amount of decorrelation may involve modifying the mix ratio based, at least in part, on transient information. The process of determining an amount of decorrelation for the audio data may involve reducing an amount of decorrelation in response to the detection of the smooth transient event.
[0049] Processing the audio data may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data and mixing the filtered audio data with a portion of the received audio data accordingly. with a mixing ratio. The process of reducing the amount of decorrelation may involve modifying the mix ratio.
[0050] Processing the audio data may involve applying a decorrelation filter to a portion of the audio data to produce filtered audio data, gauging a gain to apply to the filtered audio data, applying the gain to the filtered audio data and mixing the filtered audio data with a portion of the received audio data. The gauging process may involve matching a power of filtered audio data to a power of received audio data.
[0051] Some methods may involve receiving audio data corresponding to a plurality of audio channels and determining audio characteristics of the audio data. Audio characteristics may include transient information. Transient information can include an intermediate transient control value that indicates a transient value between a defined transient event and a defined non-transient event. These methods may also involve forming encoded data from audio frames that include encoded transient information.
[0052] Encoded transient information may include one or more control flags. The method may involve coupling at least a two or more channel portion of the audio data to at least one coupling channel. Control flags can include at least one of a channel block switch flag, an out-of-couple channel flag, or an in-use-coupled flag. The method may involve determining a combination of one or more of the control flags to form coded transient information that indicates at least one of a defined transient event, a defined non-transient event, a probability of a transient event, or a severity of a transient event. transient event.
[0053] The process of determining transient information may involve the assessment of at least one of a probability or a severity of a transient event. Encoded transient information can indicate at least one of a defined transient event, a defined non-transient event, the probability of a transient event, or the severity of a transient event. The process of determining transient information may involve evaluating a temporal power variation in the audio data.
[0054] Encoded transient information may include a transient control value corresponding to a transient event. The transient control value can be subjected to an exponential decay function. Transient information may indicate that a decorrelation process should be temporarily slowed down or stopped.
[0055] Transient information may indicate that a mix ratio of a decorrelation process must be modified. For example, transient information may indicate that an amount of decorrelation in a decorrelation process should be temporarily reduced.
[0056] Some methods may involve receiving audio data corresponding to a plurality of audio channels and determining audio characteristics of the audio data. Audio characteristics can include spatial parameter data. The methods may involve determining at least two decorrelation filtering processes for the audio data based at least in part on the audio characteristics. Decorrelation filtering processes can cause interdecorrelation ("IDC") signal consistency between channel-specific decorrelation signals for at least one channel pair. Decorrelation filtering processes may involve applying a decorrelation filter to at least a portion of the audio data in order to produce filtered audio data. Channel-specific decorrelation signals can be produced by performing operations on the filtered audio data.
[0057] Methods may involve applying decorrelation filtering processes to at least a portion of the audio data in order to produce the channel-specific decorrelation signals, determining mixing parameters based, at least in part, on the characteristics and in mixing the specific channel decorrelated signals with a direct portion of the audio data according to the mixing parameters. The direct portion can correspond to the portion to which the decorrelation filter is applied.
[0058] The method may also involve receiving information regarding various output channels. The process of determining at least two decorrelation filtering processes for the audio data may be based, at least in part, on the number of output channels. The receiving process may involve receiving audio data corresponding to N input audio channels. The method may involve determining that the audio data for N input audio channels will undergo channel downsizing or upscaling. channeling the audio data to K audio output channels and producing uncorrelated audio data corresponding to the K audio output channels.
[0059] The method may involve channel reduction or channel widening from audio data for input audio channels N to audio data for intermediate audio channels M, producing uncorrelated audio data for intermediate audio channels M and the channel reduction or channel widening of the uncorrelated audio data for the intermediate audio channels M for the uncorrelated audio data for the audio output channels K. The determination of the two decorrelation filtering processes for the audio data may be based, at least in part, on the number M of intermediate audio channels. Decorrelation filtering processes can be determined based, at least in part, on N-to-K, M-to-K, or N-to-M mixing equations.
[0060] The method may also involve inter-channel consistency control ("ICC") between a plurality of audio channel pairs. The process of controlling ICC may involve at least one of a receipt of an ICC value or a determination of an ICC value based, at least in part, on spatial parameter data.
[0061] The process of controlling the ICC may involve at least one of a receipt of a set of ICC values or a determination of the set of ICC values based, at least in part, on spatial parameter data. The method may also involve determining a set of IDC values based, at least in part, on the set of ICC values and synthesizing a set of channel-specific decorrelation signals that correspond to the set of IDC values, performing operations on the filtered audio data.
[0062] The method may also involve a conversion process between a first representation of the spatial parameter data and a second representation of the spatial parameter data. The first representation of the spatial parameter data may include a representation of the consistency between individual discrete channels and a coupling channel. The second representation of the spatial parameter data may include a representation of the consistency between the individual discrete channels.
[0063] The process of applying the decorrelation filtering processes to at least a portion of the audio data may involve applying the same decorrelation filter to the audio data so that a plurality of channels produce the filtered audio data and multiplying the filtered audio data corresponding to a left channel or a right channel by -1. The method may also involve reversing a polarity of filtered audio data corresponding to a left surround channel with reference to filtered audio data corresponding to the left channel, and reversing a polarity of filtered audio data corresponding to a right surround channel. with reference to the filtered audio data corresponding to the right channel.
[0064] The process of applying the decorrelation filtering processes to at least a portion of the audio data may involve applying a first decorrelation filter to the audio data so that a first and second channel produce channel-filtered first data and second channel-filtered data and applying a second decorrelation filter to the audio data so that a third and fourth channel produce third channel-filtered data and fourth channel-filtered data. The first channel can be a left channel, the second channel can be a right channel, the third channel can be a left surround channel, and the fourth channel can be a right surround channel. The method may also involve reversing a polarity of the first channel-filtered data relative to the second channel-filtered data and reversing a polarity of the third channel-filtered data relative to the fourth channel-filtered data. The processes of determining at least two decorrelation filtering processes for the audio data may involve either determining that a different decorrelation filter will be applied to the audio data for a center channel or determining that a different decorrelation filter will be applied to the audio data for a center channel. decorrelation will not be applied to the audio data for the center channel.
[0065] The method may also involve receiving specific channel scaling factors and a matching channel signal corresponding to a plurality of coupled channels. The application process may involve applying at least one of the coupling channel decorrelation filtering processes to generate channel-specific filtered audio data, and applying channel-specific scaling factors to the channel-filtered audio data. to produce the specific channel decorrelation signals.
[0066] The method may also involve determining decorrelation signal synthesis parameters based, at least in part, on spatial parameter data. Decorrelation signal synthesizing parameters can be specific output channel decorrelation signal synthesizing parameters. The method may also involve receiving a coupled channel signal corresponding to a plurality of coupled channels and specific channel scaling factors. At least one of the processes of determining at least two decorrelation filtering processes for the audio data and applying the decorrelation filtering processes to a portion of the audio data may involve generating a set of decorrelation signals from seed by applying a set of decorrelation filters to the coupling channel signal, sending the seed decorrelation signals to a synthesizer, applying the specific output channel decorrelation signal synthesis parameters to the decorrelation signals of seed received by the synthesizer to produce channel-specific synthesized decorrelation signals, multiplying the channel-specific synthesized decorrelation signals by appropriate channel-specific scaling factors so that each channel produces channel-specific sized synthesized decorrelation signals, and outputting the channel-specific decorrelation signals. esp channel scaled synthesized decorrelation for a straight signal and a decorrelation signal mixer.
[0067] The method may also involve receiving specific channel scale factors. At least one of the processes for determining at least two decorrelation filtering processes for the audio data and applying the decorrelation filtering processes to a portion of the audio data may involve: generating a set of decorrelation signals specific seed channel by applying a set of decorrelation filters to the audio data; sending the seed-specific channel decorrelation signals to a synthesizer; determining a specific channel pair level adjustment parameter set based, at least in part, on specific channel scaling factors; apply the output channel-specific decorrelation signal synthesis parameters and channel-specific pair level adjustment parameters to the seed channel-specific decorrelation signals received by the synthesizer to produce channel-specific synthesized decorrelation signals, and output the decorrelation signals synthesized by specific channel for a straight signal and a decorrelation signal mixer.
[0068] Determining specific output channel decorrelation signal synthesis parameters may involve determining a set of CDI values based at least in part on spatial parameter data and determining signal synthesis parameters output channel decorrelation that match the set of IDC values. The set of IDC values can be determined, at least in part, according to a consistency between individual discrete channels and a coupling channel and a consistency between pairs of individual discrete channels.
[0069] The mixing process may involve using a non-hierarchical mixer to combine the channel-specific decorrelation signals with the direct portion of the audio data. Determining audio characteristics may involve receiving explicit audio information and characteristics with the audio data. Determining audio characteristics may involve determining audio characteristic information based on one or more attributes of the audio data. The spatial parameter data may include a representation of consistency between individual discrete channels and a coupling channel and/or a representation of consistency between pairs of individual discrete channels. The audio characteristics can include at least one of the pitch information or the transient information.
[0070] The determination of mixing parameters may be based, at least in part, on spatial parameter data. The method may also involve providing the mix parameters for a straight signal and a decorrelation signal mixer. Mixing parameters can be specific output channel mixing parameters. The method may also involve determining modified output channel-specific mixing parameters based, at least in part, on the output channel-specific mixing parameters and transient control information.
[0071] According to some implementations, an apparatus may include an interface and a logic system configured to receive audio data corresponding to a plurality of audio channels and determine the audio characteristics of the audio data. Audio characteristics can include spatial parameter data. The logic system can be configured to determine at least two decorrelation filtering processes for the audio data based at least in part on the audio characteristics. Decorrelation filtering processes can cause a specific IDC between channel-specific decorrelation signals for at least one channel pair. Decorrelation filtering processes may involve applying a decorrelation filter to at least a portion of the audio data to produce filtered audio data. Channel-specific decorrelation signals can be produced by performing operations on the filtered audio data.
[0072] The logic system can be configured to: apply the decorrelation filtering processes to at least a portion of the audio data in order to produce the channel-specific decorrelation signals; determine mix parameters based, at least in part, on the audio characteristics and mix the specific channel decorrelation signals with a direct portion of the audio data according to the mix parameters. The direct portion can correspond to the portion to which the decorrelation filter is applied.
[0073] The receiving process may involve receiving information regarding various output channels. The process of determining at least two decorrelation filtering processes for the audio data may be based, at least in part, on the number of output channels. For example, the receive process may involve receiving audio data corresponding to N input audio channels, and the logic system can be configured to: determine whether audio data for N input audio channels will be submitted the channel reduction or enlargement of the audio data to K audio output channels and production of uncorrelated audio data corresponding to the K audio output channels.
[0074] The logic system can be further configured to: channel down or channel up from audio data to input audio channels N from audio data to intermediate audio channels M; produce audio data uncorrelated to the intermediate audio channels M and channel down or increase channel from the audio data uncorrelated to the intermediate audio channels M to the audio data uncorrelated to audio output channels K.
[0075] The decorrelation filtering processes can be determined based, at least in part, on the N-to-K mixing equations. The determination of the two decorrelation filtering processes for the audio data can be based, at least in part, on the number M of intermediate audio channels. The decorrelation filtering processes can be determined based, at least in part, on the M-to-K or N-to-M mixing equations.
[0076] The logic system can be further configured to control ICC between a plurality of audio channel pairs. The process of controlling ICC may involve at least one of either receiving an ICC value or determining an ICC value based, at least in part, on spatial parameter data. The logic system can be further configured to determine a set of IDC values based, at least in part, on the set of ICC values, and synthesize a set of channel-specific decorrelation signals that correspond to the set of ICC values. IDC performing operations on the filtered audio data.
[0077] The logic system can be additionally configured for a conversion process between a first representation of the spatial parameter data and a second representation of the spatial parameter data. The first representation of the spatial parameter data may include a representation of the consistency between individual discrete channels and a coupling channel. The second representation of the spatial parameter data may include a representation of the consistency between the individual discrete channels.
[0078] The process of applying the decorrelation filtering processes to at least a portion of the audio data may involve applying the same decorrelation filter to the audio data so that a plurality of channels produce the filtered audio data and multiplying the data filtered audio corresponding to a left channel or a right channel by -1. The logic system can be further configured to reverse a polarity of filtered audio data corresponding to a left surround channel with reference to filtered audio data corresponding to the left channel and reverse a polarity of filtered audio data corresponding to a right surround channel with reference to the filtered audio data corresponding to the right side channel.
[0079] The process of applying the decorrelation filtering processes to at least a portion of the audio data may involve applying a first decorrelation filter to the audio data for a first and a second channel to produce the first channel filtered data and the second channel-filtered data and applying a second decorrelation filter to the audio data so that a third and fourth channel produce the third channel-filtered data and the fourth channel-filtered data. The first channel can be a left side channel, the second channel can be a right side channel, the third channel can be a left surround channel, and the fourth channel can be a right surround channel.
[0080] The logic system can be further configured to reverse a polarity of the first channel-filtered data relative to the second channel-filtered data and reverse a polarity of the third channel-filtered data relative to the fourth channel-filtered data. The processes of determining at least two decorrelation filtering processes for the audio data may involve either determining that a different decorrelation filter will be applied to the audio data for a center channel or determining that a decorrelation filter will not will be applied to the audio data for the center channel.
[0081] The logic system can be further configured to receive, from the interface, specific channel scaling factors and a coupling channel signal corresponding to a plurality of coupled channels. The application process may involve applying at least one of the decorrelation filtering processes so that the coupling channel generates channel-specific filtered audio data, and applying the channel-specific scaling factors to the filtered audio data. per specific channel to produce the channel specific decorrelation signals.
[0082] The logic system can be further configured to determine the decorrelation signal synthesis parameters based, at least in part, on the spatial parameter data. Decorrelation signal synthesizing parameters can be specific output channel decorrelation signal synthesizing parameters. The logic system may be further configured to receive, from the interface, a coupling channel signal corresponding to a plurality of coupled channels and channel-specific scaling factors.
[0083] At least one of the processes of determining at least two decorrelation filtering processes for the audio data and applying the decorrelation filtering processes to a portion of the audio data may involve: generating a set of decorrelation signals from seed applying a set of decorrelation filters to the coupling channel signal; sending the seed decorrelation signals to a synthesizer; applying the output channel-specific decorrelation signal synthesizing parameters to the seed decorrelation signals received by the synthesizer to produce channel-specific synthesized decorrelation signals; multiplying the channel-specific synthesized decorrelation signals with appropriate channel-specific scaling factors for each channel to produce channel-specific synthesized decorrelation scaling signals; and outputting the channel-specific synthesized scaled decorrelation signals to a direct signal and decorrelation signal mixer.
[0084] At least one of the processes for determining at least two decorrelation filtering processes for the audio data and applying the decorrelation filtering processes to a portion of the audio data may involve: generating a set of seed decorrelation signals channel-specific by applying a set of decorrelation filters to channel-specific audio data; sending the channel-specific seed decorrelation signals to a synthesizer; determining specific channel pair level adjustment parameters based, at least in part, on specific channel scaling factors; applying the output channel-specific decorrelation signal synthesis parameters and channel-specific pair level adjustment parameters to the channel-specific seed decorrelation signals received by the synthesizer to produce channel-specific synthesized decorrelation signals; and output the synthesized channel-specific decorrelation signals to a direct signal and decorrelation signal mixer.
[0085] Determining specific output channel decorrelation signal synthesis parameters may involve determining a set of IDC values based at least in part on spatial parameter data and determining channel decorrelation signal synthesis parameters specific output that correspond with the set of IDC values. The set of IDC values can be determined, at least in part, according to a consistency between individual discrete channels and a coupling channel and a consistency between pairs of discrete individual channels.
[0086] The mixing process may involve using a non-hierarchical mixer to combine the channel-specific decorrelation signals with the direct portion of the audio data. Determining audio characteristics may involve receiving explicit audio characteristic information with the audio data. Determining audio characteristics may involve determining audio characteristic information based on one or more attributes of the audio data. Audio characteristics may include pitch information and/or transient information.
[0087] Spatial parameter data may include a consistency representation between individual discrete channels and a coupling channel and/or a consistency representation between pairs of discrete individual channels. The determination of mixing parameters can be based, at least in part, on spatial parameter data.
[0088] The logic system can be additionally configured to supply the mixing parameters to a mixer for direct signal and signal decorrelation. Mix parameters can be specific output channel mix parameters. The logic system can be further configured to determine modified output channel-specific mixing parameters based, at least in part, on specific output channel mixing parameters and transient control information.
[0089] The apparatus may include a memory device. The interface can be an interface between the logical system and the memory device. However, the interface can be a network interface.
[0090] Some aspects of this disclosure may be deployed on non-transient media that have software stored on it. The software may include instructions for controlling an apparatus for receiving audio data corresponding to a plurality of audio channels and for determining audio characteristics of the audio data. Audio characteristics may include spatial parameter data. The software may include instructions for controlling the apparatus to determine at least two decorrelation filtering processes for the audio data based, at least in part, on the audio characteristics. Decorrelation filtering processes can cause a specific IDC between channel-specific decorrelation signals for at least one channel pair. Decorrelation filtering processes may involve applying a decorrelation filter to at least a portion of the audio data to produce filtered audio data. Channel-specific decorrelation signals can be produced by performing operations on the filtered audio data
[0091] The software may include instructions for controlling the apparatus to apply the decorrelation filtering processes to at least a portion of the audio data to produce the specific channel decorrelation signals; determining mixing parameters based, at least in part, on audio characteristics; and mixing the specific channel decorrelation signals with a direct portion of the audio data in accordance with the mixing parameters. The direct portion can correspond to the portion on which the decorrelation filter is applied.
[0092] The software may include instructions for controlling the device to receive information related to various output channels. The process of determining at least two decorrelation filtering processes for the audio data may be based, at least in part, on the number of output channels. For example, the receiving process may involve receiving audio data corresponding to N input audio channels. The software may include instructions for controlling the player to determine that audio data for N input audio channels will be downgraded. channel or channel widening to audio data for K audio output channels and to produce uncorrelated audio data corresponding to K audio output channels.
[0093] The software may include instructions for controlling the apparatus to: channel down or channel audio data for input audio channels N to audio data for intermediate audio channels M; producing uncorrelated audio data for the intermediate audio channels M; and subjecting the uncorrelated audio data for the intermediate M audio channels to the uncorrelated audio data for the K audio output channels to channel reduction or enlargement.
[0094] The determination of the two decorrelation filtering processes for the audio data can be based, at least in part, on the number M of intermediate audio channels. The decorrelation filtering processes can be determined based, at least in part, on the N to K, M to K, or N to M mixing equations.
[0095] The software may include instructions for controlling the apparatus to perform an ICC control process among a plurality of audio channel pairs. The ICC control process may involve receiving an ICC value and/or determining an ICC value based, at least in part, on spatial parameter data. The ICC control process may involve at least one of either receiving a set of ICC values or determining the set of ICC values based, at least in part, on spatial parameter data. The software may include instructions for controlling the apparatus to perform processes of determining a set of IDC values based, at least in part, on the set of ICC values and synthesizing a set of channel-specific decorrelation signals that correspond to the set of ICC values. of IDC values by performing operations on the filtered audio data.
[0096] The process of applying the decorrelation filtering processes to at least a portion of the audio data may involve applying the same decorrelation filter to audio data for a plurality of channels to produce the filtered audio data and multiplying the audio data. filtered audio corresponding to a left channel or a right channel by -1. The software may include instructions for controlling the apparatus to perform processes of reversing a polarity of filtered audio data corresponding to a left surround channel with reference to filtered audio data corresponding to the left channel and reversing a polarity of audio data filtered corresponding to a right surround channel in reference to the filtered audio data corresponding to the right channel.
[0097] The process of applying the decorrelation filter to a portion of the audio data may involve applying a first decorrelation filter to audio data for a first and second channel to produce first channel filtered data and second channel filtered data and applying a second decorrelation filter for audio data for a third and fourth channel to produce third channel filtered data and fourth channel filtered data. The first channel can be a left side channel, the second channel can be a right side channel, the third channel can be a left surround channel, and the fourth channel can be a right surround channel.
[0098] The software may include instructions for controlling the apparatus to perform processes of reversing a polarity of the first channel-filtered data relative to the second channel-filtered data and reversing a polarity of the third channel-filtered data relative to the fourth channel-filtered data. The processes of determining at least two decorrelation filtering processes for the audio data may involve either determining that a different decorrelation filter will be applied to the audio data for a center channel or determining that a decorrelation filter will not be applied. to the audio data for the center channel.
[0099] The software may include instructions for controlling the apparatus to receive specific channel scaling factors and a coupling channel signal corresponding to a plurality of coupled channels. The application process may involve applying at least one of the decorrelation filtering processes to the coupling channel to generate channel-specific filtered audio data and applying the channel-specific scaling factors to the channel-specific filtered audio data to produce the signals. channel-specific decorrelation.
[00100] The software may include instructions for controlling the apparatus to determine decorrelation signal synthesis parameters based, at least in part, on the spatial parameter data. Decorrelation signal synthesizing parameters can be specific output channel decorrelation signal synthesizing parameters. The software may include instructions for controlling the apparatus to receive a coupled channel signal corresponding to a plurality of coupled channels and specific channel scaling factors. At least one of the processes of determining at least two decorrelation filtering processes for the audio data and applying the decorrelation filtering processes to a portion of the audio data may involve: generating a set of seed decorrelation signals by applying them if a set of coupling channel signal decorrelation filters; sending the seed decorrelation signals to a synthesizer; applying the output channel-specific decorrelation signal synthesizing parameters to the seed decorrelation signals received by the synthesizer to produce channel-specific synthesized decorrelation signals; multiplying the channel-specific synthesized decorrelation signals with appropriate channel-specific scaling factors for each channel to produce channel-specific synthesized decorrelation scaling signals; and outputting the channel-specific synthesized scaled decorrelation signals to a direct signal and decorrelation signal mixer.
[00101] The software may include instructions for controlling the apparatus to receive a coupled channel signal corresponding to a plurality of coupled channels and specific channel scaling factors. At least one of the processes of determining at least two decorrelation filtering processes for the audio data and applying the decorrelation filtering processes to a portion of the audio data may involve: generating a set of channel seed decorrelation signals specific by applying a set of decorrelation filters to channel-specific audio data; sending the channel-specific seed decorrelation signals to a synthesizer; determining specific channel pair level adjustment parameters based, at least in part, on specific channel scaling factors; applying the output channel-specific decorrelation signal synthesis parameters and channel-specific pair level adjustment parameters to the channel-specific seed decorrelation signals received by the synthesizer to produce channel-specific synthesized decorrelation signals; and outputting the channel-specific synthesized decorrelation signals to a direct signal and decorrelation signal mixer.
[00102] Determining specific output channel decorrelation signal synthesis parameters may involve determining a set of IDC values based, at least in part, on the spatial parameter data, and determining decorrelation signal synthesis parameters. specific output channel that correspond with the set of IDC values. The set of IDC values can be determined, at least in part, according to a consistency between individual discrete channels and a coupling channel and a consistency between pairs of discrete individual channels.
[00103] In some deployments, a method may involve: receiving audio data comprising a first set of frequency coefficients and a second set of frequency coefficients; estimating, based on at least part of the first set of frequency coefficients, spatial parameters for at least part of the second set of frequency coefficients; and applying the estimated spatial parameters to the second set of frequency coefficients to generate a second modified set of frequency coefficients. The first set of frequency coefficients may correspond to a first frequency range and the second set of frequency coefficients may correspond to a second frequency range. The first frequency band may be below the second frequency band.
[00104] Audio data may include data corresponding to individual channels and a coupled channel. The first frequency range may correspond to an individual channel frequency range and the second frequency range may correspond to a channel coupled to the frequency range. The application process may involve applying the estimated spatial parameters on a per-channel basis.
[00105] Audio data may include frequency coefficients in the first frequency range for two or more channels. The estimation process may involve calculating combined frequency coefficients of a composite coupling channel based on the frequency coefficients of the two or more channels and computing, for at least a first channel, cross correlation coefficients between frequency coefficients of the first channel and the combined frequency coefficients. The combined frequency coefficients can correspond to the first frequency range.
[00106] Cross correlation coefficients can be normalized cross correlation coefficients. The first set of frequency coefficients may include audio data for a plurality of channels. The estimation process may involve estimating normalized cross-correlation coefficients for multiple channels from the plurality of channels. The estimation process may involve dividing at least part of the first frequency band into the first frequency band bands and computing a normalized cross correlation coefficient for each first frequency band band.
[00107] In some deployments, the estimation process may involve averaging the normalized cross-correlation coefficients across all first frequency bands of a channel and applying a scaling factor to the average of the normalized cross-correlation coefficients for obtain the estimated spatial parameters for the channel. The process of averaging normalized cross-correlation coefficients may involve averaging over a time segment of a channel. The scaling factor may decrease with increasing frequency.
[00108] The method may involve adding noise to model the variance of the estimated spatial parameters. The added noise variance can be based, at least in part, on the variance in the normalized cross-correlation coefficients. The added noise variance may be dependent, at least in part, on a spatial parameter prediction across bands, with the variance dependence on the prediction being based on empirical data.
[00109] The method may involve receiving or determining pitch information related to the second set of frequency coefficients. Applied noise may vary depending on the hue information.
[00110] The method may involve measuring per-band power ratios between bands of the first set of frequency coefficients and bands of the second set of frequency coefficients. The estimated spatial parameters may vary according to the energy ratios per band. In some deployments, the estimated spatial parameters may vary with temporal changes of incoming audio signals. The estimation process may involve operations only on actual value frequency coefficients.
[00111] The process of applying the estimated spatial parameters to the second set of frequency coefficients can be part of a decorrelation process. In some deployments, the decorrelation process may involve generating a reverberation signal or a decorrelation signal and applying it to the second set of frequency coefficients. The decorrelation process may involve applying a decorrelation algorithm that operates entirely on real-value coefficients. The decorrelation process may involve selective or signal-adaptive decorrelation of specific channels. The decorrelation process may involve selective or signal adaptive decorrelation of specific frequency bands. In some implementations, the first and second sets of frequency coefficients may be the result of applying a modified discrete sine transform, a modified discrete cosine transform, or an orthogonal superimposed transform to audio data in a time domain.
[00112] The estimation process can be based, at least in part, on estimation theory. For example, the estimation process may be based, at least in part, on at least one of a maximum probability method, a Bayes estimator, a moment estimator method, a least mean square error estimator, or an unbiased estimator. of minimum variance.
[00113] In some deployments, audio data may be received in a bitstream encoded according to a legacy encoding process. The legacy encoding process can, for example, be an AC-3 audio codec process or the Enhanced AC-3 audio codec. Applying the spatial parameters can yield more spatially accurate audio reproduction than that obtained by decoding the bitstream according to a legacy decoding process that corresponds to the legacy encoding process.
[00114] Some deployments involve the device which includes an interface and a logic system. The logic system may be configured to: receive audio data comprising a first set of frequency coefficients and a second set of frequency coefficients; estimating, based on at least part of the first set of frequency coefficients, spatial parameters for at least part of the second set of frequency coefficients; and applying the estimated spatial parameters to the second set of frequency coefficients to generate a second set of modified frequency coefficients.
[00115] The apparatus may include a memory device. The interface can be an interface between the logical system and the memory device. However, the interface can be a network interface.
[00116] The first set of frequency coefficients may correspond to a first frequency range and the second set of frequency coefficients may correspond to a second frequency range. The first frequency band can be below the second frequency band. Audio data can include data corresponding to individual channels and a coupled channel. The first frequency range may correspond to an individual channel frequency range and the second frequency range may correspond to a channel coupled to the frequency range.
[00117] The application process may involve applying the estimated spatial parameters on a per channel basis. Audio data can include frequency coefficients in the first frequency range for two or more channels. The estimation process may involve calculating combined frequency coefficients of a composite coupling channel based on frequency coefficients of the two or more channels and computing, for at least a first channel, cross correlation coefficients between frequency coefficients of the first channel and the combined frequency coefficients.
[00118] Combined frequency coefficients may correspond to the first frequency range. Cross correlation coefficients can be normalized cross correlation coefficients. The first set of frequency coefficients may include audio data for a plurality of channels. The estimation process may involve estimating normalized cross-correlation coefficients for multiple channels from the plurality of channels.
[00119] The estimation process may involve dividing the second frequency band into the second frequency band bands and computing a normalized cross correlation coefficient for each second frequency band band. The estimation process may involve dividing the first frequency range into first frequency range bands, averaging the normalized cross-correlation coefficients across all first frequency range bands, and applying a scaling factor to the average of the frequency range coefficients. normalized cross-correlation to obtain the estimated spatial parameters.
[00120] The process of averaging normalized cross correlation coefficients may involve averaging across a time segment of a channel. The logic system can be further configured to add noise to the second set of modified frequency coefficients. Noise addition can be added to model a variance of the estimated spatial parameters. The noise variance added by the logic system can be based, at least in part, on a variance in the normalized cross-correlation coefficients. The logic system can be further configured to receive or determine pitch information associated with the second set of frequency coefficients and vary the applied noise accordingly.
[00121] In some deployments, audio data may be received in a bitstream encoded according to a legacy encoding process. For example, the legacy encoding process can be an AC-3 audio codec process or an Enhanced AC-3 audio codec process.
[00122] Some aspects of this disclosure may be deployed on non-transient media that has software stored on it. The software may include instructions for controlling an apparatus to: receive audio data comprising a first set of frequency coefficients and a second set of frequency coefficients; estimating, based on at least part of the first set of frequency coefficients, spatial parameters for at least part of the second set of frequency coefficients; and applying the estimated spatial parameters to the second set of frequency coefficients to generate a second modified set of frequency coefficients.
[00123] The first set of frequency coefficients may correspond to a first frequency range and the second set of frequency coefficients may correspond to a second frequency range. Audio data can include data corresponding to individual channels and a coupled channel. The first frequency range may correspond to an individual channel frequency range and the second frequency range may correspond to a channel coupled to the frequency range. The first frequency band can be below the second frequency band.
[00124] The application process may involve applying the estimated spatial parameters on a per channel basis. Audio data can include frequency coefficients in the first frequency range for two or more channels. The estimation process may involve calculating combined frequency coefficients of a composite coupling channel based on frequency coefficients of the two or more channels and computing, for at least a first channel, cross correlation coefficients between frequency coefficients of the first channel and the combined frequency coefficients.
[00125] Combined frequency coefficients may correspond to the first frequency range. Cross correlation coefficients can be normalized cross correlation coefficients. The first set of frequency coefficients may include audio data for a plurality of channels. The estimation process may involve estimating multi-channel normalized cross correlation coefficients from the plurality of channels. The estimation process may involve dividing the second frequency band into the second frequency band bands and computing a normalized cross-correlation coefficient for each second frequency band band.
[00126] The estimation process may involve: dividing the first frequency band into the first frequency band bands; averaging the normalized cross-correlation coefficients across all first bands of the frequency range; and applying a scaling factor to the average of the normalized cross-correlation coefficients to obtain the estimated spatial parameters. The process of averaging normalized cross-correlation coefficients may involve averaging across a time segment of a channel.
[00127] The software may also include instructions for controlling the decoding apparatus to add noise to the second set of modified frequency coefficients in order to model a variance of the estimated spatial parameters. An added noise variance can be based, at least in part, on a variance in the normalized cross-correlation coefficients. The software may also include instructions for controlling the decoding apparatus to receive or determine pitch information associated with the second set of frequency coefficients. Applied noise may vary depending on the hue information.
[00128] In some deployments, audio data may be received in a bitstream encoded according to a legacy encoding process. For example, the legacy encoding process can be an AC-3 audio codec process or an Enhanced AC-3 audio codec process.
[00129] According to some implementations, a method may involve: receiving audio data corresponding to a plurality of audio channels; determine audio characteristics of audio data; determining decorrelation filter parameters for the audio data based, at least in part, on the audio characteristics; forming a decorrelation filter in accordance with the decorrelation filter parameters; and apply the decorrelation filter to at least some of the audio data. For example, audio characteristics may include pitch information and/or transient information.
[00130] Determining audio characteristics may involve receiving explicit pitch information or transient information with the audio data. Determining audio characteristics may involve determining pitch information or transient information based on one or more attributes of the audio data.
[00131] In some deployments, the decorrelation filter may include a linear filter with at least one delay element. The decorrelation filter may include a full pass filter.
[00132] Decorrelation filter parameters can include dithering parameters or randomly selected pole locations for at least one pole of the full pass filter. For example, dither parameters or pole locations might involve a maximum step value for pole movement. The maximum step value can be substantially zero for high tonal audio data signals. Dithering parameters or pole locations can be linked by restricted areas within which pole movements are restricted. In some deployments, the restricted areas can be circles or rings. In some deployments, sandboxes may be fixed. In some deployments, different channels of audio data may share the same sandboxes.
[00133] According to some implementations, the poles can be dithered independently for each channel. In some deployments, pole movements may not be linked by restricted areas. In some deployments, the poles may maintain a substantially consistent spatial or angular relationship relative to each other. According to some implementations, a pole-to-one distance within a Z-plane circle may be a function of audio data frequency.
[00134] In some deployments, an appliance may include an interface and a logic system. In some deployments, the logic system may include a general purpose single-chip or multi-chip processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic and/or discrete hardware components.
[00135] The logic system can be configured to receive, from the interface, audio data corresponding to a plurality of audio channels and to determine audio characteristics of the audio data. In some deployments, audio characteristics may include pitch information and/or transient information. The logic system can be configured to determine decorrelation filter parameters for the audio data based at least in part on the audio characteristics, form a decorrelation filter according to the decorrelation filter parameters, and apply the decorrelation filter. decorrelation in at least some of the audio data.
[00136] The decorrelation filter may include a linear filter with at least one delay element. Decorrelation filter parameters can include dithering parameters or randomly selected pole locations for at least one pole of the decorrelation filter. Dithering parameters or pole locations can be linked by restricted areas within which pole movements are restricted. Dither parameters or pole locations can be determined by reference to a maximum step value for pole movement. The maximum step value can be substantially zero for high tonal audio data signals.
[00137] The apparatus may include a memory device. The interface can be an interface between the logical system and the memory device. However, the interface can be a network interface.
[00138] Some aspects of this disclosure may be deployed on non-transient media that has software stored on it. The software may include instructions for controlling an apparatus to: receive audio data corresponding to a plurality of audio channels; determining audio characteristics of the audio data, the audio characteristics comprising at least one of pitch information or transient information; determining decorrelation filter parameters for the audio data based, at least in part, on the audio characteristics; forming a decorrelation filter in accordance with the decorrelation filter parameters; and applying the decorrelation filter to at least some of the audio data. The decorrelation filter may include a linear filter with at least one delay element.
[00139] Decorrelation filter parameters can include dithering parameters or pole locations randomly selected by at least one pole of the decorrelation filter. Dithering parameters or pole locations can be linked by restricted areas within which pole movements are restricted. Dither parameters or pole locations can be determined by reference to a maximum step value for pole movement. The maximum step value can be substantially zero for high tonal audio data signals.
[00140] According to some implementations, a method may involve: receiving audio data corresponding to a plurality of audio channels; determining decorrelation filter control information corresponding to a maximum pole shift of a decorrelation filter; determining decorrelation filter parameters for the audio data based, at least in part, on decorrelation filter control information; forming the decorrelation filter according to the decorrelation filter parameters; and apply the decorrelation filter to at least some of the audio data.
[00141] Audio data can be in the time domain or in the frequency domain. Determining the decorrelation filter control information may involve receiving an express indication of the maximum pole shift.
[00142] Determining the decorrelation filter control information may involve determining audio characteristic information and determining the maximum pole shift based, at least in part, on the audio characteristic information. In some deployments, the audio characteristic information may include at least one of the pitch information or transient information.
[00143] Details of one or more implementations of the matter described in this descriptive report are presented in the attached drawings and in the description below. Other features, aspects, and advantages will become apparent from the description, drawings, and embodiments. It should be noted that the relative dimensions of the Figures below may not be drawn to scale. BRIEF DESCRIPTION OF THE DRAWINGS
[00144] Figures 1A and 1B are graphs showing examples of channel coupling during an audio encoding process.
[00145] Figure 2A is a block diagram illustrating elements of an audio processing system.
[00146] Figure 2B provides an overview of the operations that can be performed by the audio processing system of Figure 2A.
[00147] Figure 2C is a block diagram showing elements of an alternative audio processing system.
[00148] Figure 2D is a block diagram showing an example of how a decorrelation can be used in an audio processing system.
[00149] Figure 2E is a block diagram illustrating elements of an alternative audio processing system.
[00150] Figure 2F is a block diagram showing examples of decorrelator elements.
[00151] Figure 3 is a flowchart illustrating an example of a decorrelation process.
[00152] Figure 4 is a block diagram that illustrates examples of decorrelation components that can be configured to perform the decorrelation process of Figure 3.
[00153] Figure 5A is a graph showing an example of pole movement of a full pass filter.
[00154] Figures 5B and 5C are graphs showing alternative examples of pole movement of a full-pass filter.
[00155] Figures 5D and 5E are graphs that show alternative examples of restricted areas that can be applied when moving the poles of a full pass filter.
[00156] Figure 6A is a block diagram illustrating an alternative implementation of a decorrelator.
[00157] Figure 6B is a block diagram illustrating another deployment of a decorrelator.
[00158] Figure 6C illustrates an alternative implementation of an audio processing system.
[00159] Figures 7A and 7B are vector diagrams that provide a simplified illustration of spatial parameters.
[00160] Figure 8A is a flowchart illustrating blocks of some decorrelation methods provided in this document.
[00161] Figure 8B is a flowchart illustrating blocks of a sideways sign change method.
[00162] Figures 8C and 8D are block diagrams that illustrate components that can be used to implement some signal switching methods.
[00163] Figure 8E is a flowchart illustrating blocks of a method of determining sintering coefficients and mixing coefficients of spatial parameter data.
[00164] Figure 8F is a block diagram showing examples of mixer components.
[00165] Figure 9 is a flowchart that highlights a process of synthesizing decorrelation signals in multichannel cases.
[00166] Figure 10A is a flowchart that provides an overview of a method for estimating spatial parameters.
[00167] Figure 10B is a flowchart that provides an overview of an alternative method for estimating spatial parameters.
[00168] Figure 10C is a graph indicating the relationship between the scaling term VB and the band index l.
[00169] Figure 10D is a graph that indicates the relationship between the variables VM and q.
[00170] Figure 11A is a flowchart that highlights some transient determination methods and transient-related controls.
[00171] Figure 11B is a block diagram that includes examples of various components for transient determination and transient related controls.
[00172] Figure 11C is a flowchart that highlights some methods of determining transient control values based, at least in part, on temporal power variations of audio data.
[00173] Figure 11D is a graph illustrating an example of raw transient map values for transient control values.
[00174] Figure 11E is a flowchart that highlights a method of encoding transient information.
[00175] Figure 12 is a block diagram that provides examples of components of an appliance that can be configured to implement aspects of the processes described in this document.
[00176] Similar reference numerals and designations in the various drawings indicate the same elements. DESCRIPTION OF EXAMPLE MODALITIES
[00177] The following description is directed to certain deployments for the purposes of describing some innovative aspects of this disclosure, as well as examples of contexts in which these innovative aspects can be deployed. However, the teachings in the present document can be applied in a number of different ways. Although the examples provided in this application are primarily described in terms of the AC-3 audio codec, and the Enhanced AC-3 audio codec (also known as E-AC-3), the concepts provided in this document apply to other codecs. including but not limited to MPEG-2 AAC and MPEG-4 AAC. Furthermore, the described deployments may be incorporated into various audio processing devices, including but not limited to encoders and/or decoders, which may be included in mobile phones, smart phones, desktop computers, handheld or portable computers, netbook-type computers, notebook-type computers, smartbook-type computers, tablet computers, stereo systems, televisions, DVD players, digital recording devices, and a variety of other devices. Accordingly, the teachings of this disclosure are not intended to be limited to the deployments shown in the Figures and/or described herein, but rather to have broad applicability.
[00178] Some audio codecs, which include the AC-3 and E-AC-3 audio codecs (proprietary deployments of which "Dolby Digital" and "Dolby Digital Plus" are licensed), employ some form of channel coupling to exploit cross-channel redundancies, encode data more efficiently, and reduce the encoding bitrate. For example, with the AC-3 and E-AC-3 codecs, in a coupling channel frequency range in addition to a specific "coupling start frequency", the modified discrete cosine transform (MDCT) coefficients of the discrete channels (also referred to herein as "single channels") undergo channel reduction to a mono channel, which may be referred to herein as a "composite channel" or a "couple channel". Some codecs can form two or more coupling channels.
[00179] The AC-3 and E-AC-3 decoders channel up the mono signal from the coupling channel to the discrete channels using scaling factors based on the coupling coordinates sent in the bit stream. In this way, the decoder restores a high frequency envelope, but not the phase, of the audio data in the coupling channel frequency range of each channel.
[00180] Figures 1A and 1B are graphs showing examples of channel coupling during an audio encoding process. Graph 102 of Figure 1A indicates an audio signal that corresponds to a left channel before channel coupling. Graph 104 indicates an audio signal that corresponds to a right channel before channel coupling. Figure 1B shows the left and right channels after encoding, including channel coupling, and decoding. In this simplified example, graph 106 indicates that the audio data for the left channel is substantially unchanged, while graph 108 indicates that the audio data for the right channel is now in phase with the audio data for the left channel.
[00181] As shown in Figures 1A and 1B, the signal decoded beyond the coupling start frequency can be consistent across channels. Consequently, the decoded signal beyond the coupling start frequency may sound spatially elapsed as compared to the original signal. When decoded channels undergo channel reduction, for example, in binaural rendering via headphone virtualization or playback on stereo speakers, coupled channels can be added consistently. This can lead to a timbre mismatch when compared to the original reference signal. The negative effects of channel coupling can be particularly evident when the decoded signal is binaurally rendered in headphones.
[00182] Various deployments described in this document may mitigate these effects, at least in part. Some such deployments involve innovative audio encoding and/or decoding tools. Such deployments can be configured to restore the phase diversity of the output channels in frequency regions encoded by channel coupling. According to various implementations, an uncorrelated signal can be synthesized from the spectral coefficients decoded in the coupling channel frequency range of each output channel.
[00183] However, many other types of audio processing devices and methods are described in this document. Figure 2A is a block diagram illustrating elements of an audio processing system. In that implementation, the audio processing system 200 includes a buffer 201, a switch 203, a decorrelator 205, and an inverse transform module 255. The switch 203 can, for example, be a cross-point switch. Temporary storage 201 receives audio data elements 220a to 220n, directs audio data elements 220a to 220n to switch 203 and sends copies of audio data elements 220a to 220n to decorrelator 205.
[00184] In this example, audio data elements 220a to 220n correspond to a plurality of audio channels 1 to N. Here, audio data elements 220a to 220n include frequency domain representations that correspond to bank coefficients filter of an audio encoding or processing system, which may be a legacy encoding or audio processing system. However, in alternative implementations, the audio data elements 220a to 220n may correspond to a plurality of frequency bands 1 to N.
[00185] In this implementation, all audio data elements 220a to 220n are received by both switch 203 and decorrelator 205. Here, all audio data elements 220a to 220n are processed by decorrelator 205 to produce data elements uncorrelated audio signals 230a to 230n. Furthermore, all uncorrelated audio data elements 230a to 230n are received by switch 203.
[00186] However, not all of the uncorrelated audio data elements 230a to 230n are received by the inverse transform module 255 and converted to time domain audio data 260. Instead, the switch 203 selects which of the uncorrelated audio data 230a to 230n will be received by the inverse transform module 255. In this example, the switch 203 selects, according to the channel, which of the audio data elements 230a to 230n will be received by the inverse transform module 255. Here , for example, audio data element 230a is received by inverse transform module 255, while audio data element 230n is not. Instead, switch 203 sends audio data element 220n, which has not been processed by decorrelator 205, to inverse transform module 255.
[00187] In some implementations, the switch 203 may determine whether to send a direct audio data element 220 or an uncorrelated audio data element 230 to the inverse transform module 255 according to predetermined settings corresponding to channels 1 through N. Alternative or additionally, the switch 203 may determine to send an audio data element 220 or an uncorrelated audio data element 230 to the inverse transform module 255 in accordance with specific channel components of the selection information 207, which may be generated or stored locally, or received with the audio data 220. Accordingly, the audio processing system 200 can provide selective decorrelation of specific audio channels.
[00188] Alternatively or additionally, the switch 203 may determine to send a direct audio data element 220 or an uncorrelated audio data element 230 to the inverse transform module 255 in accordance with changes in the audio data 220. For example , the switch 203 can determine which, if any, of the uncorrelated audio data elements 230 are sent to the inverse transform module 255 in accordance with signal adaptive components of the selection information 207, which can indicate transients or pitch changes in the data. 220. In alternative implementations, the switch 203 may receive such adaptive information by signal from the decorrelator 205. In still other implementations, the switch 203 may be configured to determine changes in the audio data, such as transients or pitch changes. . Accordingly, the audio processing system 200 can provide signal adaptive decorrelation of specific audio channels.
[00189] As noted above, in some deployments, audio data elements 220a to 220n may correspond to a plurality of frequency bands 1 to N. In some such deployments, switch 203 may determine to send an audio data element 220 or an uncorrelated audio data element 230 to the inverse transform module 255 according to predetermined definitions corresponding to the frequency bands and/or according to received selection information 207. Consequently, the audio processing system 200 can provide selective decorrelation of specific frequency bands.
[00190] Alternatively or additionally, the switch 203 may determine to send a direct audio data element 220 or an uncorrelated audio data element 230 to the inverse transform module 255 in accordance with changes in the audio data 220, which may be indicated by selection information 207 or by information received from decorrelator 205. In some deployments, switch 203 may be configured to determine changes to audio data. Therefore, the audio processing system 200 can provide signal adaptive decorrelation of specific frequency bands.
[00191] Figure 2B provides an overview of the operations that can be performed by the audio processing system of Figure 2A. In this example, method 270 begins with a process of receiving audio data corresponding to a plurality of audio channels (block 272). The audio data may include a frequency domain representation corresponding to filter bank coefficients of an audio encoding or processing system. The audio encoding or processing system may, for example, be a legacy audio encoding or processing system such as AC-3 or E-AC-3. Some deployments may involve receiving control engine elements in a bit stream produced by the legacy encoding system or audio processing, such as block switching indications, etc. The decorrelation process may be based, at least in part, on control mechanism elements. Detailed examples are provided below. In this example, method 270 also involves applying a decorrelation process to at least some of the audio data (block 274). The decorrelation process can be performed with the same filter bank coefficients used by the audio encoding or processing system.
[00192] Again referring to Figure 2A, the decorrelator 205 can relate various types of decorrelation operations depending on the particular implementation. Many examples are provided in this document. In some implementations, the decorrelation process is performed without converting coefficients from the frequency domain representation of the audio data elements 220 to another frequency domain or time domain representation. The decorrelation process may involve generating reverberation signals or decorrelation signals by applying linear filters to at least a portion of the frequency domain representation. In some deployments, the decorrelation process may involve applying a decorrelation algorithm that operates entirely on real-value coefficients. As used herein, "actual value" means the use of only one of a cosine or sine modulated filter banks.
[00193] The decorrelation process may involve applying a decorrelation filter to a portion of the received audio data elements 220a to 220n to produce filtered audio data elements. The decorrelation process may involve using a non-hierarchical mixer to combine a direct portion of the received audio data (to which no decorrelation filter has been applied) with the audio data filtered according to spatial parameters. For example, a direct portion of the audio data element 220a may be mixed with a filtered portion of the audio data element 220a via a specific output channel. Some deployments may include a specific output channel combiner (eg, a linear combiner) of decorrelation or reverb signals. Several examples are described below.
[00194] In some implementations, the spatial parameters may be determined by the appropriate audio processing system 200 for analysis of the received audio data 220. Alternatively or additionally, the spatial parameters may be received in a bit stream, along with the data from audio 220 as part or all of the decorrelation information 240. In some implementations, the decorrelation information 240 may include correlation coefficients between individual discrete channels and a coupling channel, correlation coefficients between individual discrete channels, explicit tonality and/or transient information. The de-correlation process may involve de-correlating at least a portion of the audio data 220 based, at least in part, on the de-correlation information 240. Some deployments may be configured to use both spatially determined and locally received and/or other parameters. de-correlation information. Several examples are described below.
[00195] Figure 2C is a block diagram showing elements of an alternative audio processing system. In this example, audio data elements 220a to 220n include audio data for N audio channels. Audio data elements 220a to 220n include frequency domain representations corresponding to filter bank coefficients of an audio encoding or processing system. In this deployment, the frequency domain representations are the result of applying a critically sampled filter bank, with perfect reconstruction. For example, frequency domain representations can be the result of applying a modified discrete sine transform, a modified discrete cosine transform, or an orthogonal superimposed transform to audio data in a time domain.
[00196] The decorrelator 205 applies a decorrelation process to at least a portion of the audio data elements 220a to 220n. For example, the decorrelation process may involve generating reverberation signals or decorrelation signals by applying linear filters to at least a portion of the audio data elements 220a through 220n. The decorrelation process can be performed, at least in part, in accordance with decorrelation information 240 received by the decorrelator 205. For example, the decorrelation information 240 can be received in a bit stream along with the frequency domain representations. of the audio data elements 220a to 220n. Alternatively, or additionally, at least some decorrelation information may be determined locally, for example, by decorrelator 205.
[00197] The inverse transform module 255 applies an inverse transform to produce the time domain audio data 260. In this example, the inverse transform module 255 applies an inverse transform equivalent to a critically sampled filter bank, with perfect reconstruction . The critically sampled, perfect reconstruction filter bank can match that applied to the audio data in the time domain (e.g. by an encoding device) to produce the frequency domain representations of the audio data elements 220a to 220n.
[00198] Figure 2D is a block diagram showing an example of how a decorrelation can be used in an audio processing system. In this example, the audio processing system 200 is a decoder that includes a decorrelator 205. In some deployments, the decoder can be configured to work according to the AC-3 or E-AC- 3. However, in some deployments, the audio processing system may be configured to process audio data for other audio codecs. Decorrelator 205 may include various subcomponents, such as those described elsewhere in this document. In that example, a channel enhancer 225 receives audio data 210, which includes includes frequency domain representations of audio data from a coupling channel. The frequency domain representations are MDCT coefficients in this example.
[00199] Channel enhancer 225 also receives coupling coordinates 212 for each channel and frequency range of coupling channel. In this deployment, scaling information, in the form of coupling coordinates 212, was computed in a Dolby Digital or Dolby Digital Plus encoder in an exponent and mantissa fashion. Channel booster 225 can compute frequency coefficients for each output channel by multiplying the coupling channel frequency coordinates by the coupling coordinates for that channel.
[00200] In this implementation, the channel enhancer 225 outputs MDCT coefficients decoupled from individual channels in the coupling channel frequency range to the decorrelator 205. Consequently, in this example, the audio data 220 that is input to the decorrelator 205 includes coefficients of MDCT.
[00201] In the example shown in Figure 2D, the uncorrelated audio data 230 output by the uncorrelated 205 includes uncorrelated MDCT coefficients. In this example, not all of the audio data received by the audio processing system 200 is also decorrelated by the decorrelator 205. For example, the frequency domain representations of audio data 245a, for frequencies below the coupling channel frequency range , as well as the frequency domain representations of audio data 245b, for frequencies above the coupling channel frequency range, are not uncorrelated by the decorrelator 205. This data, along with the uncorrelated MDCT coefficients 230 that are output from the decorrelator 205 , are inserted into an inverse MDCT process 255. In this example, the audio data 245b includes MDCT coefficients determined by the Spectral Extension tool, an audio bandwidth extension tool for the E-AC-3 audio codec.
[00202] In this example, decorrelation information 240 is received by decorrelator 205. The type of decorrelation information 240 received may vary depending on the implementation. In some deployments, decorrelation information 240 may include explicit decorrelator-specific control information and/or explicit information that may form the basis of such control information. Decorrelation information 240 may, for example, include spatial parameters such as correlation coefficients between individual discrete channels and a coupling channel and/or correlation coefficients between individual discrete channels. Such explicit decorrelation information 240 may also include explicit pitch information and/or transient information. This information can be used to determine, at least in part, the decorrelation filter parameters for decorrelator 205.
[00203] However, in alternative implementations, none of such explicit decorrelation information 240 is received by the decorrelator 205. According to some such implementations, the decorrelation information 240 may include information from a bitstream of an audio codec inherited. For example, decorrelation information 240 may include time segmentation information that is available in a bit stream encoded according to the AC-3 audio codec or the E-AC-3 audio codec. Decorrelation information 240 may include coupling information in use, block switching information, exponent information, exponent strategy information, etc. such information may have been received by an audio processing system in a bit stream along with audio data 210.
[00204] In some deployments, the decorrelator 205 (or another element of the audio processing system 200) may determine spatial parameters, pitch information, and/or transient information based on one or more attributes of the audio data. For example, audio processing system 200 can determine spatial parameters for frequencies in the coupling channel frequency range based on audio data 245a or 245b outside the coupling channel frequency range. Alternatively or additionally, the audio processing system 200 may determine pitch information based on information from a bitstream of a legacy audio codec. Some of such deployments will be described below.
[00205] Figure 2E is a block diagram illustrating the elements of an alternative audio processing system. In this implementation, the audio processing system 200 includes an N to M channel extender/channel reducer 262 and M to K channel extender/channel reducer 264. Here, the audio data elements 220a through 220n, which include transform coefficients for N audio channels, are received by the N-to-M channel enhancer/channel reducer 262 and the decorrelator 205.
[00206] In this example, the channel enhancer/channel reducer from N to M 262 can be configured to subject the audio data to channel widening or channel reduction for the N channels up to the audio data for the M channels, according to mixing information 266. However, in some implementations, the N to M channel reducer/channel enlarger 262 may be a through element. In such deployments, N=M. Mixing information 266 may include mixing equations from N to M. Mixing information 266 may, for example, be received by audio processing system 200 in a bit stream along with decorrelation information 240, representations domains corresponding to a coupling channel, etc. In this example, the de-correlator 240 information that is received by the de-correlator 205 indicates that the de-correlator 205 should output the M channels of the de-correlated audio data 230 to the switch 203.
[00207] The switch 203 can determine, in accordance with the selection information 207, whether direct audio data from the N to M channel booster/channel reducer 262 or the uncorrelated audio data 230 will be forwarded to the channel booster/channel reducer from M to K 264. The channel booster/channel reducer from M to K 264 can be configured to subject the audio data to channel widening or channel reduction for the M channels up to the audio data for the K channels, according to the mix information 268. In such deployments, the mix information 268 may include mixing equations from M to K. For deployments where N=M, the amplifier channel/channel reducer from M to K 264 can subject the audio data to channel widening or channel reduction for the N channels up to the audio data for the K channels according to the mixing information 268. In such deployments , the mix information 268 can include mix equations of M for K. Mixing information 268 may, for example, be received by the audio processing system 200 in a bit stream along with decorrelation information 240 and other data.
[00208] N to M, M to K or N to K mixing equations can be channel widening or channel reduction equations. The N to M, M to K, or N to K mixing equations can be a set of linear combination coefficients that map input audio signals to output audio signals. According to some such implementations, the M to K mixing equations can be stereo channel reduction equations. For example, the channel extender/channel reducer M for K 264 can be configured to subject the audio data to channel reduction for 4, 5, 6 or more channels up to the audio data for 2 channels, according to the mixing equations from M to K in the mix information 268. In some such deployments, the audio data for a left channel ("L"), a center channel ("C") and a left surround channel ("Ls" ) can be combined, according to the M to K mixing equations, into a left stereo output channel Lo. Audio data for a right channel ("R"), the center channel and a surround right channel ("Rs") can be combined, according to the M to K mixing equations, into a right stereo output channel Ro. For example, the mixing equations from M to K might be as follows: Lo = L + 0.707C + 0.707LsRo = R + 0.707C + 0.707Rs
[00209] Alternatively, the mixing equations from M to K can be as follows: Lo = L + -3dB*C + att*LsRo = R + -3dB*C + att*Rs,
[00210] where att can, for example, represent a value such as -3dB, -6dB, -9dB, or zero. For deployments where N=M, the above equations can be considered N to K mixing equations.
[00211] In this example, the decorrelation information 240 that is received by the decorrelator 205 indicates that the audio data for the M channels will subsequently be channel widened or channel reduced up to K channels. The decorrelator 205 can be configured to use a different decorrelation process depending on whether the data for the M channels will subsequently be channel widened or channel reduced up to the audio data for K channels. Accordingly, the decorrelator 205 can be configured to determine the decorrelation filtering processes based, at least in part, on the M to K mixing equations. For example, if the M channels are subsequently subjected to channel reduction to the K channels, different decorrelation filters can be used for the channels that will be combined in the subsequent channel reduction. According to such an example, if the decorrelation information 240 indicates that the audio data for channels L, R, Ls and Rs will undergo channel reduction up to 2 channels, a decorrelation filter can be used for both L channels and R and another decorrelation filter can be used for both Ls and Rs channels.
[00212] In some deployments, M = K. In such deployments, the channel enlarger/channel reducer from M to K 264 may be a through element.
[00213] However, in other deployments, M>K. In such deployments, the M to K 264 channel extender/channel reducer can function as a channel reducer. According to some such deployments, a less computationally intensive method to generate uncorrelated channel reduction can be used. For example, decorrelator 205 can be configured to generate uncorrelated audio data 230 only for channels that switch 203 will send to inverse transform module 255. For example, if N=6 and M=2, decorrelator 205 can be configured to output 230 uncorrelated audio data for only 2 channels undergoing channel reduction. In the process, the decorrelator 205 can use the decorrelation filters for only 2 channels instead of 6, reducing complexity. The corresponding mix information can be included in the decorrelation information 240, the mix information 266 and the mix information 268. Consequently, the decorrelator 205 can be configured to determine the decorrelation filtering processes based on at least least in part, in the N-to-M, N-to-K, or M-to-K mixing equations.
[00214] Figure 2F is a block diagram showing examples of decorrelating elements. The elements shown in Figure 2F may, for example, be implemented in a logic system of a decoding apparatus, such as the apparatus described below with reference to Figure 12. Figure 2F depicts a decorrelator 205 that includes a decoding generator. de-correlator 218 and a mixer 215. In some embodiments, the de-correlator 205 may include other elements. Examples of other elements of the disconnector 205 and how they might work are set out elsewhere in this document.
[00215] In this example, audio data 220 is input to decorrelation signal generator 218 and mixer 215. Audio data 220 may correspond to a plurality of audio channels. For example, audio data 220 may include data resulting from channel coupling during an audio encoding process that has undergone channel widening before being received by decorrelator 205. In some embodiments, audio data 220 may be in the time domain, while in other embodiments, the audio data 220 may be in the frequency domain. For example, audio data 220 may include time sequences of transform coefficients.
[00216] The decorrelation signal generator 218 can form one or more decorrelation filters, apply the decorrelation filters to the audio data 220, and supply the resulting decorrelation signals 227 to the mixer 215. In this example, the mixer combines the data from audio 220 with the de-correlated signals 227 to produce the de-correlated audio data 230.
[00217] In some embodiments, the decorrelation signal generator 218 may determine the decorrelation filter control information for a decorrelation filter. According to some such modalities, the decorrelation filter control information can correspond to a maximum pole shift of the decorrelation filter. Decorrelation signal generator 218 can determine decorrelation filter parameters for audio data 220 based, at least in part, on decorrelation filter control information.
[00218] In some deployments, determining the decorrelation filter control information may involve receiving an express indication of the decorrelation filter control information (e.g. an express indication of a maximum pole offset) with the audio data 220 In alternative deployments, determining the decorrelation filter control information may involve determining the audio characteristic information and determining the decorrelation filter parameters (such as a maximum pole shift) based, at least in part, on the information of audio feature. In some deployments, audio characteristic information may include spatial information, pitch information, and/or transient information.
[00219] Some deployments of the decorrelator 205 will now be described in more detail with reference to Figures 3 to 5E. Figure 3 is a flowchart illustrating an example of a decorrelation process. Figure 4 is a block diagram illustrating examples of decorrelator components that may be configured to perform the decorrelation process of Figure 3. The decorrelation process 300 of Figure 3 may be performed, at least in part, on a decoding apparatus such as the one described below with reference to Figure 12.
[00220] In this example, process 300 starts when a de-correlator receives the audio data (block 305). As described above with reference to Figure 2F, audio data can be received by decorrelation signal generator 218 and mixer 215 from decorrelator 205. Here, at least some of the audio data is received from a channel booster, such as channel enlarger 225 of Figure 2D. As such, the audio data corresponds to a plurality of audio channels. In some implementations, the audio data received by the decorrelator may include a time sequence of frequency domain representations of the audio data (such as MDCT coefficients) in the coupling channel frequency range of each channel. In alternative deployments, the audio data may be in the time domain.
[00221] In block 310, the decorrelation filter control information is determined. Decorrelation filter control information can, for example, be determined according to the audio characteristics of the audio data. In some deployments, such as the example shown in Figure 4, such audio characteristics may include explicit spatial information, pitch information, and/or transient information encoded with the audio data.
[00222] In the embodiment shown in Figure 4, the decorrelation filter 410 includes a fixed delay 415 and a time-varying portion 420. In this example, the decorrelation signal generator 218 includes a decorrelation filter control module 405 to control the time-varying portion 420 of the decorrelation filter 410. In this example, the decorrelation filter control module 405 receives explicit pitch information 425 in the form of a pitch flag. In this deployment, the decorrelation filter control module 405 also receives explicit transient information 430. In some deployments, explicit pitch information 425 and/or explicit transient information 430 may be received with the audio data, for example , as part of the decorrelation information 240. In some deployments, the explicit pitch information 425 and/or the explicit transient information 430 may be generated locally.
[00223] In some implementations, no explicit spatial information, pitch information, or transient information is received by the decorrelator 205. In some such implementations, a transient control module of the decorrelator 205 (or other element of an audio processing system) may be configured to determine transient information based on one or more attributes of the audio data. A spatial parameter module of decorrelator 205 can be configured to determine spatial parameters based on one or more attributes of the audio data. Some examples are described elsewhere in this document.
[00224] In block 315 of Figure 3, the decorrelation filter parameters for the audio data are determined, at least in part, based on the decorrelation filter control information determined in block 310. A decorrelation filter may then be formed in accordance with the decorrelation filter parameters as shown at block 320. The filter may, for example, be a linear filter with at least one delay element. In some deployments, the filter may be based, at least in part, on a meromorphic function. For example, the filter might include a full pass filter.
[00225] In the implementation shown in Figure 4, the decorrelation filter control module 405 can control the time-varying portion 420 of the decorrelation filter 410 based, at least in part, on tone flag 425 and/or transient information explicit 430 received by decorrelator 205 in the bitstream. Some examples are described below. In this example, the de-correlation filter 410 is only applied to audio data in the coupling channel frequency range.
[00226] In this embodiment, the decorrelation filter 410 includes a fixed delay 415 followed by the time variable portion 420, which is a full pass filter in this example. In some embodiments, the decorrelation signal generator 218 may include a bank of full pass filters. For example, in some embodiments where audio data 220 is in the frequency domain, decorrelation signal generator 218 may include a full pass filter for each of a plurality of frequency ranges. However, in alternative deployments, the same filter can be applied to each frequency range. Alternatively, frequency ranges can be grouped and the same filter applied to each group. For example, frequency ranges can be grouped into frequency bands, they can be grouped by channel and/or grouped by frequency band and by channel.
[00227] The amount of fixed delay can be selectable, for example, by a logic device and/or according to user input. In order to introduce controlled chaos into the decorrelation signals 227, the decorrelation filter control 405 can apply the decorrelation filter parameters to control the poles of the full pass filter(s) so that a or more of the poles move randomly or pseudorandomly in a restricted region.
[00228] Consequently, the decorrelation filter parameters can include parameters to move at least one pole of the full pass filter. Such parameters may include parameters to dither one or more poles of the full pass filter. Alternatively, the decorrelation filter parameters may include parameters for selecting a pole location from among a plurality of predetermined pole locations for each pole of the full pass filter. At a predetermined time interval (for example, once in each Dolby Digital Plus block), a new location for each pole of the full pass filter can be chosen randomly or pseudorandomly.
[00229] Some of such deployments will now be described with reference to Figures 5A to 5E. Figure 5A is a graph showing an example of the movement of the poles of a full-pass filter. The 500 plot is a pole plot of a 3rd order full pass filter. In this example, the filter has two complex poles (poles 505a and 505c) and one real pole (pole 505b). The large circle is unit circle 515. Over time, pole locations can be dotted (or otherwise changed) so that they move within constraint areas 510a, 510b, and 510c, which constrain the paths possible from poles 505a, 505b and 505c, respectively.
[00230] In this example, restriction areas 510a, 510b, and 510c are circular. The initial (or "seed") locations of the poles 505a, 505b and 505c are indicated by the circles at the centers of the restriction areas 510a, 510b and 510c. In the example of Figure 5A, the constraint areas 510a, 510b, and 510c are circles of radius 0.2 centered on the initial pole locations. Poles 505a and 505c correspond to a complex conjugate pair, while pole 505b is a real pole.
[00231] However, other deployments may include more or fewer poles. Alternative deployments can also include restriction areas of different sizes or shapes. Some examples are shown in Figures 5D and 5E and are described below.
[00232] In some deployments, different channels of audio data share the same restriction areas. However, in alternative deployments, the audio data channels do not share the same restriction areas. Whether or not the audio data channels share the same restriction areas, the poles can be dithered (or otherwise moved) independently of each audio channel.
[00233] A sample path from pole 505a is indicated by arrows within restriction area 510a. Each arrow represents a move or "step" 520 of pole 505a. Although not shown in Figure 5A, the two poles of the complex conjugate pair, poles 505a and 505c, move in tandem so that the poles retain their conjugate relationship.
[00234] In some implementations, the movement of a pole can be controlled by changing the maximum step value. The maximum pitch value can correspond to a maximum pole offset from the most recent pole location. The maximum pitch value can define a circle that has a radius equal to the maximum pitch value.
[00235] Such an example is shown in Figure 5A. Pole 505a is shifted from its initial location by step 520a to location 505a'. Pitch 520a may have been restricted according to a previous maximum pitch value, for example, an initial maximum pitch value. After pole 505a moves from its initial location to location 505a', a new maximum pitch value is determined. The max pitch value defines a 525 max pitch circle that has a radius equal to the max pitch value. In the example shown in Figure 5A, the next step (step 520b) happens to be equal to the maximum step value. Therefore, step 520b moves the pole to location 505a'', on the circumference of maximum step value 525. However, steps 520 can generally be smaller than the maximum step value.
[00236] In some deployments, the maximum step value may be reset after each step. In other deployments, the maximum step value may be reset after multiple steps and/or as the audio data changes.
[00237] The maximum step value can be determined and/or controlled in various ways. In some deployments, the maximum step value may be based, at least in part, on one or more attributes of the audio data to which the decorrelation filter may be applied.
[00238] For example, the maximum pitch value may be based, at least in part, on pitch information and/or transient information. According to some such implementations, the maximum pitch value can be at or close to zero for highly tonal signals from the audio data (such as the audio data for a tuning fork, a spinet, etc.), which can do with little or no change at the poles to occur. In some deployments, the maximum step value may be at or close to zero at the instant of an attack on a transient signal (such as audio data for an explosion, a door slam, etc.). Subsequently (for example, over a time period of a few blocks), the maximum step value can be raised to a higher value.
[00239] In some deployments, pitch and/or transient information may also be detected at the decoder, based on one or more attributes of the audio data. For example, pitch and/or transient information can be determined according to one or more attributes of the audio data by a module such as the control information receiver/generator 640, which is described below with reference to Figures 6B and 6C. Alternatively, explicit tone and/or transient information may be transmitted from the encoder and received in a bit stream received by a decoder, for example, via tone flags and/or transients.
[00240] In this deployment, the movement of a pole can be controlled according to the dithering parameters. Consequently, although the movement of a pole may be constrained according to a maximum pitch value, the direction and/or extent of pole movement may include a random or quasi-random component. For example, the movement of a pole may be based, at least in part, on the output of a pseudorandom number generator or random number generator algorithm implemented in the software. Such software can be stored on non-transient media and run by a logical system.
[00241] However, in alternative deployments, the decorrelation filter parameters may not involve dithering parameters. Instead, pole movement can be restricted to predetermined pole locations. For example, several predetermined pole locations can lie within a radius defined by a maximum pitch value. A logic system may randomly or pseudorandomly select one of these predetermined pole locations as the next pole location.
[00242] Various other methods can be employed to control pole motion. In some deployments, if a pole is approaching the boundary of a constraint area, the selection of pole moves may be shifted to new pole locations that are closer to the center of the constraint area. For example, if the pole 505a moves to the edge of the restriction area 510a, the center of the maximum pitch circle 525 can be shifted inward towards the center of the restriction area 510a, so that the maximum pitch circle 525 always be within the boundary of the 510a restriction area.
[00243] In some of such deployments, a weighting function can be applied in order to create a deviation that tends to move a pole location away from a restriction area boundary. For example, the predetermined pole locations within the maximum pitch circle 525 may not be assigned equal probabilities of being selected as the next pole location. Rather, predetermined pole locations that are closer to the center of the restriction area can be assigned a higher probability than predetermined pole locations that are relatively farther from the center of the restriction area. According to some such implementations, when pole 505a is near the boundary of restriction area 510a, it is more likely that the next pole move will be towards the center of restriction area 510a.
[00244] In this example, the 505b pole locations also change, but are controlled so that the 505b pole continues to remain real. Accordingly, the locations of the pole 505b are constrained to lie along the diameter 530 of the constraint area 510b. In alternative deployments, however, the 505b pole can be moved to locations that have an imaginary component.
[00245] In still other deployments, the locations of all poles can be constrained to move only along the radii. In some of such deployments, changes in pole location only increase or decrease the poles (in terms of magnitude), but do not affect their phase. Such deployments can be useful, for example, to check a selected reverb time constant.
[00246] Poles for frequency coefficients corresponding to higher frequencies may be relatively closer to the center of unit circle 515 than poles for frequency coefficients corresponding to lower frequencies. Figure 5B, a variation of Figure 5A, will be used to illustrate an exemplary deployment. Here, at a given instant of time, the triangles 505a''', 505b''' and 505c''' indicate the pole locations at the frequency f0 obtained after dithering or some other process that describes their time variation. Let the pole at 505a’’’ be indicated by z1 and the pole at 505b’’’ be indicated by z2. The pole at 505c''' is the complex conjugate of the pole at 505a''' and is therefore represented by where the asterisk indicates complex conjugation.
[00247] The poles for the filter used at any other frequency f are obtained, in this example, by scaling the poles z1, z2 and ' by a factor a(f)/a(f0), where a(f) is a function which decreases with the frequency of audio data f. When f = f0, the scaling factor is equal to 1 and the poles are in the expected locations. According to some such implementations, smaller group delays can be applied to frequency coefficients that correspond to smaller frequencies. In the embodiment described here, the poles are dotted at one frequency and scaled to obtain pole locations for other frequencies. The frequency f0 could be, for example, the coupling start frequency. In alternative deployments, the poles could be separately dithered at each frequency and the restriction areas (510a, 510b, and 510c) could be substantially closer to the origin at higher frequencies compared to lower frequencies.
[00248] In accordance with various implementations described herein, the poles 505 may be movable, but may maintain a substantially consistent spatial or angular relationship to one another. In some of such deployments, the movements of the 505 poles may not be limited according to the restricted areas.
[00249] Figure 5C shows such an example. In this example, the complex conjugate poles 505a and 505c can be movable in a clockwise or counterclockwise direction within the unit circle 515. When poles 505a and 505c are moved (eg, within a predetermined time interval), both poles can be rotated by an angle θ which is selected randomly or quasi-randomly. In some embodiments, this angular movement can be constrained by a maximum angular step value. In the example shown in Figure 5C, pole 505a has been moved through an angle θ in a clockwise direction. Consequently, pole 505c was moved by an angle θ in a counterclockwise direction in order to maintain the complex torque relationship between pole 505a and pole 505c.
[00250] In this example, pole 505b is restricted from moving along the real axis. In some such deployments, poles 505a and 505c may also be movable toward or away from the center of unit circle 515, for example, as described above with reference to Figure 5B. In alternate deployments, the 505b pole may not be moved. In still other deployments, the 505b pole can be moved from the actual geometry axis.
[00251] In the examples shown in Figures 5A and 5B, the restriction areas 510a, 510b and 510c are circular. However, several other restriction area formats are contemplated by the inventors. For example, the restriction area 510d of Figure 5D is substantially oval in shape. The 505d pole can be positioned at various locations within the 510d oval restriction area. In the example of Figure 5E, the constraint area 510e is an annular space. Pole 505e can be positioned at various locations within the annular space of restriction area 510d.
[00252] Returning now to Figure 3, at block 325, a decorrelation filter is applied to at least some of the audio data. For example, the decorrelation signal generator 218 of Figure 4 may apply a decorrelation filter to at least some of the input audio data 220. The output of the decorrelation filter 227 may be uncorrelated to the input audio data. 220. Furthermore, the output of the decorrelation filter can have substantially the same power spectral density as the input signal. Therefore, the output of the decorrelation filter 227 may sound natural. At block 330, the output of the decorrelation filter is mixed with the input audio data. At block 335, the uncorrelated audio data is output. In the example of Figure 4, at block 330, mixer 215 combines the output of decorrelation filter 227 (which may be referred to herein as "filtered audio data") with input audio data 220 (which may be referred to herein as "direct audio data"). At block 335, mixer 215 outputs uncorrelated audio data 230. If it is determined at block 340 that more audio data will be processed, decorrelation process 300 reverts to block 305. Otherwise, decorrelation process 300 ends . (Block 345.)
[00253] Figure 6A is a block diagram illustrating an alternative deployment of a decorrelator. In this example, mixer 215 and decorrelation signal generator 218 receive audio data elements 220 that correspond to a plurality of channels. At least some of the audio data elements 220 can, for example, be output from a channel booster, such as the channel booster 225 of Figure 2D.
[00254] Here, the mixer 215 and the de-correlation signal generator 218 also receive various types of de-correlation information. In some implementations, at least some of the decorrelation information may be received in a bitstream along with the audio data elements 220. Alternatively or additionally, at least some of the decorrelation information may be determined locally, for example by others components of the decorrelator 205 or by one or more other components of the audio processing system 200.
[00255] In this example, the decorrelation information received includes the decorrelation signal generator control information 625. The decorrelation signal generator control information 625 may include the decorrelation filter information, gain information, input control information, etc. the decorrelation signal generator produces the decorrelation signals 227 based, at least in part, on the control information from the decorrelation signal generator 625.
[00256] Here, the decorrelation information received also includes the transient control information 430. Various examples of how the decorrelator 205 can use and/or generate the transient control information 430 are provided elsewhere in this disclosure.
[00257] In this deployment, mixer 215 includes synthesizer 605 and mixer for direct signal and signal decorrelation 610. In this example, synthesizer 605 is an output channel-specific combiner of decorrelation or reverb signals, such as de-correlation signals 227 received from de-correlation signal generator 218. According to some such implementations, synthesizer 605 may be a linear combiner of the de-correlation or reverberation signals. In that example, decorrelation signals 227 correspond to audio data elements 220 for a plurality of channels to which one or more decorrelation filters have been applied by the decorrelation signal generator. Accordingly, decorrelation signals 227 may also be referred to herein as "filtered audio data" or "filtered audio data elements".
[00258] Here, the direct signal and decorrelation signal mixer 610 is a specific combiner for output channels of the filtered audio data elements with the "direct" audio data elements 220 that correspond to a plurality of channels, to produce the de-correlated audio data 230. Accordingly, the de-correlator 205 can provide channel-specific, non-hierarchical de-correlation of the audio data.
[00259] In that example, synthesizer 605 combines decorrelation signals 227 according to decorrelation signal synthesis parameters 615, which may also be referred to herein as "decorrelation signal synthesis coefficients". Similarly, the direct signal and signal decorrelation mixer 610 combines the filtered and direct audio data elements according to the mix coefficients 620. The decorrelation signal synthesis parameters 615 and the mix coefficients 620 may be based, at least in part, on the decorrelation information received.
[00260] Here, the decorrelation information received includes the spatial parameter information 630, which is channel specific in this example. In some implementations, the mixer 215 can be configured to determine the decorrelation signal synthesizer parameters 615 and/or the mix coefficients 620 based, at least in part, on the special parameter information 630. In this example, the information from received decorrelation also includes channel reduction/channel enlargement information 635. For example, channel reduction/channel enlargement information 635 can indicate how many channels of the audio data were combined to produce audio data subjected to channel reduction, which can correspond to one or more coupling channels in a coupling channel frequency range. The 635 channel reduction/channel enlargement information can also indicate various desired output channels and/or characteristics of the output channels. As described above with reference to Figure 2E, in some implementations, the channel reduce/channel increase information 635 may include information that corresponds to the mix information 266 received by the channel extender/channel reducer from N to M 262 and/or the mix information 268 received by the channel booster/channel reducer from M to K 264.
[00261] Figure 6B is a block diagram illustrating another deployment of a decorrelator. In this example, decorrelator 205 includes a control information receiver/generator 640. Here, control information receiver/generator 640 receives audio data elements 220 and 245. In this example, audio data elements 220 are also received by mixer 215 and decorrelation signal generator 218. In some implementations, audio data elements 220 may correspond to audio data in a coupling channel frequency range, while of audio data 245 may correspond to audio data that is in one or more frequency ranges outside the coupling channel frequency range.
[00262] In this implementation, the control information receiver/generator 640 determines the decorrelation signal generator control information 625 and the mixer control information 645 according to the decorrelation information 240 and/or the data elements 220 and/or 245 audio. Some examples of the 640 control information receiver/generator and its functionality are described below.
[00263] Figure 6C illustrates an alternative implementation of an audio processing system. In this example, the audio processing system 200 includes a decorrelator 205, a switch 203, and an inverse transform module 255. In some deployments, the switch 203 and the inverse transform module 255 may be substantially as described above. with reference to Figure 2A. Similarly, the mixer 215 and the de-correlation signal generator may be substantially as described elsewhere herein.
[00264] The 640 Control Information Receiver/Generator may have different functionality depending on the specific deployment. In this implementation, the control information receiver/generator 640 includes a filter control module 650, a transient control module 655, a mixer control module 660, and a spatial parameter module 665. audio processing 200, control information receiver/generator elements 640 may be implemented through hardware, firmware, software stored on non-transient media, and/or combinations thereof. In some deployments, these components may be deployed by a logical system as described elsewhere in this disclosure.
[00265] Filter control module 650 may, for example, be configured to control the decorrelation signal generator as described above with reference to Figures 2E-5E and/or as described below with reference to Figure 11B. Several examples of the functionality of the 655 transient control module and the 660 mixer control module are provided below.
[00266] In this example, the control information receiver/generator 640 receives the audio data elements 220 and 245, which may include at least a portion of the audio data received by the switch 203 and/or the decorrelator 205. The elements of audio data 220 is received by mixer 215 and decorrelation signal generator 218. In some implementations, audio data elements 220 may correspond to audio data in a coupling channel frequency range, whereas of audio data 245 may correspond to audio data that is in a frequency range outside the coupling channel frequency range. For example, audio data elements 245 may correspond to audio data that is in a frequency range above and/or below this coupling channel frequency range.
[00267] In this implementation, the control information receiver/generator 640 determines the decorrelation signal generator control information 625 and the mixer control information 645 according to the decorrelation information 240, the audio data elements 220 and/or audio data elements 245. Control information receiver/generator 640 provides decorrelation signal generator control information 625 and mixer control information 645 to decorrelation signal generator 218 and mixer 215, respectively.
[00268] In some deployments, the 640 receiver/control information generator can be configured to determine the pitch information and to determine the 625 decorrelation signal generator control information and/or the 645 mixer control information with based, at least in part, on hue information. For example, control information receiver/generator 640 can be configured to receive explicit pitch information through explicit pitch information, such as pitch flags, as part of decorrelation information 240. Receiver/information generator 640 can be configured to process the received explicit pitch information and determine the pitch control information.
[00269] For example, if the control information receiver/generator 640 determines that the audio data in the coupling channel frequency range is highly tonal, the control information receiver/generator 640 can be configured to provide the information decorrelation signal generator controls 625 that indicate that the maximum step value should be set to zero or near zero, which causes little or no variation in the poles to occur. Subsequently (for example, over a time period of a few blocks), the maximum step value can be raised to a higher value. In some deployments, if the control information receiver/generator 640 determines that the audio data in the coupling channel frequency range is highly tonal, the control information receiver/generator 640 can be configured to indicate to the module parameter 665 that a relatively greater degree of smoothing can be applied in calculating various quantities, such as the energies used in estimating the spatial parameters. Other examples of responses to the determination of highly tonal audio data are provided elsewhere in this document.
[00270] In some deployments, the control information receiver/generator 640 can be configured to determine pitch information according to one or more attributes of the audio data 220 and/or according to information from a bit stream from a legacy audio code that is received via decorrelation information 240, such as exponent information and/or exponent strategy information.
[00271] For example, in the audio data bitstream encoded according to the E-AC-3 audio codec, the exponents for the transform coefficients are differentially encoded. The sum of absolute exponent differences over a frequency range is a measurement of the distance traveled along the spectral envelope of the signal in a log-magnitude domain. Signals such as tuning fork and spinet have a picket-fence spectrum and therefore the path along which this distance is measured is characterized by many peaks and valleys. Thus, for such signals, the distance traveled along the spectral envelope in the same frequency range is greater than for signals for audio data that correspond to, for example, applause or rain, which have a relatively flat spectrum.
[00272] Therefore, in some deployments, the control information receiver/generator 640 may be configured to determine a pitch metric based, at least in part, according to exponent differences, in the channel frequency range of coupling. For example, the control information receiver/generator 640 can be configured to determine a pitch metric based on the average absolute exponent difference over the coupling channel frequency range. According to some of such implementations, the hue metric is only calculated when exponent coupling strategy is shared for all blocks in a frame and does not indicate exponent frequency sharing, in such case it is meaningful to define the difference. exponent from one frequency range to the next. According to some implementations, the hue metric is only calculated if the E-AC-3 adaptive hybrid transform ("AHT") flag is set for the coupling channel.
[00273] If the pitch metric is determined as the absolute exponent difference of the E-AC-3 audio data, in some deployments, the pitch metric can take a value between 0 and 2, because -2, -1, 0, 1 and 2 are only the allowed exponent differences according to E-AC-3. One or more pitch thresholds can be set in order to differentiate between tonal and non-tonal signals. For example, some deployments involve setting a threshold for entering a hue state and another threshold for exiting a hue state. The threshold for exiting the hue state can be lower than the threshold for entering the hue state. Such deployments provide a degree of hysteresis, so that hue values just below the upper limit do not inadvertently cause a hue state change. In one example, the threshold for exiting the hue state is 0.40, while the threshold for entering the hue state is 0.45. However, other deployments might include more or less thresholds, and the thresholds might have different values.
[00274] In some deployments, the hue metric calculation may be weighted according to the energy present in the signal. This energy can be derived directly from the exponents. The log energy metric can be inversely proportional to exponents, because exponents are represented as negative powers of two in E-AC-3. According to such deployments, those parts of the spectrum that have low energy will contribute less to the overall hue metric than those parts of the spectrum that have high energy. In some deployments, the hue metric calculation can only be performed at block zero of a frame.
[00275] In the example shown in Figure 6C, the uncorrelated audio data 230 from the mixer 215 is provided to the switch 203. In some implementations, the switch 203 can determine which components of the direct audio data 220 and the uncorrelated audio data 230 will be sent to the inverse transform module 255. Accordingly, in some implementations the audio processing system 200 may provide selective or signal adaptive decorrelation of audio data components. For example, in some deployments the audio processing system 200 may provide selective or signal adaptive decorrelation of specific channels of audio data. Alternatively, or additionally, in some implementations the audio processing system 200 may provide selective or signal adaptive decorrelation of specific frequency bands of audio data.
[00276] In various implementations of the audio processing system 200, the control information receiver/generator 640 can be configured to determine one or more types of spatial parameters from the audio data 220. In some implementations, at least some of this functionality can be provided by the spatial parameter module 665 shown in Figure 6C. Some of these spatial parameters may be correlation coefficients between individual discrete channels and a coupling channel, which may also be referred to herein as "alphas". For example, if the coupling channel includes audio data for four channels, there may be four alphas, one alpha for each channel. In some of these deployments, the four channels may be the left channel ("L"), the right channel ("R"), the left surround channel ("Ls"), and the right surround channel ("Rs"). In some implementations, the coupling channel may include audio data for the above-described channels and a center channel. An alpha may or may not be calculated for the center channel, depending on whether the center channel will be uncorrelated. Other deployments may involve fewer or fewer channels.
[00277] Other spatial parameters may be inter-channel correlation coefficients that indicate a correlation between pairs of individual discrete channels. These parameters may sometimes be referred to in this document as reflecting "interchannel consistency" or "ICC". In the four-channel example referenced above, there can be six ICC values involved, for the LR pair, the L-Ls pair, the L-Rs pair, the R-Ls pair, the R-Rs pair, and the Ls-Rs pair. .
[00278] In some implementations, the determination of spatial parameters by the control information receiver/generator 640 may involve receiving explicit spatial parameters in a bit stream, for example via the decorrelation information 240. Alternatively, or additionally, the receiver /control information generator 640 can be configured to estimate at least some spatial parameters. The control information receiver/generator 640 can be configured to determine mix parameters based, at least in part, on spatial parameters. Consequently, in some deployments, functions related to determining and processing spatial parameters can be performed, at least in part, by the 660 mixer control module.
[00279] Figures 7A and 7B are vector diagrams that provide a simplified illustration of spatial parameters. Figures 7A and 7B can be considered a 3D conceptual representation of signals in an N-dimensional vector space. Each N-dimensional vector can represent a real- or complex-valued random variable whose N coordinates correspond to any N independent proofs. For example, the N coordinates can correspond to a collection of N frequency domain coefficients of a signal within a frequency range and/or within a time interval (eg, during a few audio blocks).
[00280] First with reference to the left panel of Figure 7A, this vector diagram represents the spatial relationships between a left input channel lin, a right input channel rin, and a coupling channel xmono, a mono channel reduction formed by summing them. if lin and rin. Figure 7A is a simplified example of forming a coupling channel, which can be performed by an encoding apparatus. The correlation coefficient between the left input channel lin and the coupling channel xmono is αL, and the correlation coefficient between the right input channel rin and the coupling channel is αR. Consequently, the angle θL between the vectors representing the left input channel lin and the coupling channel xmono is equal to arccos(αL) and the angle θR between the vectors representing the right input channel rin and the coupling channel xmono is equal to arcs(αR).
[00281] The right panel of Figure 7A shows a simplified example of decorrelation of an individual output channel from a coupling channel. Such a decorrelation process can be carried out, for example, by a decoding apparatus. By generating a decorrelation signal yL that is uncorrelated with (perpendicular to) the xmono coupling channel and mixing it with the xmono coupling channel using appropriate weights, the amplitude of the individual output channel (lout , in this example) and its angular separation from the xmono coupling channel can accurately reflect the amplitude of the individual input channel and its spatial relationship to the coupling channel. The decorrelation signal yL must have the same power distribution (represented here by vector length) as the xmono coupling channel. In that

[00282] However, restoring the spatial relationship between individual discrete channels and a coupling channel does not guarantee the restoration of the spatial relationships between the discrete channels (represented by the ICCs). This fact is illustrated in Figure 7B. The two panels in Figure 7B show two extreme cases. The separation between lout and rout is maximized when the decorrelation signals yL and yR are separated by 180°, as shown in the left panel of Figure 7B. In this case, the ICC between the left and right channels is minimized and the phase diversity between lout and rout is maximized. On the contrary, as shown in the right panel of Figure 7B, the separation between lout and rout is minimized when the decorrelation signals yL and yR are separated by 0°. In this case, the ICC between the left and right channels is maximized and the phase diversity between lout and rout is minimized.
[00283] In the examples shown in Figure 7B, all vectors illustrated are in the same plane. In other examples, yL and yR can be positioned at other angles with respect to each other. However, it is preferred that yL and yR are perpendicular, or at least substantially perpendicular, to the xmono coupling channel. In some examples either of yL and yR can extend, at least partially, in a plane that is orthogonal to the plane of Figure 7B.
[00284] Due to the fact that discrete channels are ultimately reproduced and presented to listeners, proper restoration of the spatial relationships between discrete channels (the ICCs) can significantly improve the restoration of spatial characteristics of audio data. As can be seen from the examples in Figure 7B, an accurate restoration of ICCs depends on creating decorrelation signals (here, yL and yR) that have appropriate spatial relationships with each other. This correlation between decorrelation signals may be referred to herein as the consistency of interdecorrelation signals or "IDC".
[00285] In the left panel of Figure 7B, the IDC between yL and yR is -1. As noted above, this IDC corresponds to a minimum ICC between the left and right channels. Comparing the left panel of Figure 7B with the left panel of Figure 7A, it can be seen that in this example with two coupled channels, the spatial relationship between lout and rout accurately reflects the spatial relationship between lin and rin. In the right panel of Figure 7B, the IDC between yL and yR is 1 (full correlation). Comparing the right panel of Figure 7B with the left panel of Figure 7A, it can be seen that in this example the spatial relationship between lout and rout does not accurately reflect the spatial relationship between lin and rin.
[00286] Consequently, by setting the IDC between individual spatially adjacent channels to -1, the ICC between these channels can be minimized and the spatial relationship between the channels can be restored closely when these channels are dominant. This results in an overall sound image that is perceptually close to the sound image of the original audio signal. These methods may be referred to in this document as "switching the sign" methods. In these methods, no knowledge of the actual ICCs is required.
[00287] Figure 8A is a flowchart illustrating blocks of some decorrelation methods provided in this document. As with another method described herein, method blocks 800 are not necessarily performed in the order indicated. Also, some implementations of Method 800 and other methods may include more or fewer blocks than indicated or described. Method 800 begins with block 802, where audio data corresponding to a plurality of audio channels is received. Audio data may, for example, be received by a component of an audio decoding system. In some implementations, the audio data may be received by a decorrelator of an audio decoding system, such as one of the deployments of the decorrelator 205 disclosed herein. The audio data may include audio data elements for a plurality of audio channels produced by channel widening the audio data corresponding to a coupling channel. According to some implementations, the audio data may have been channel widened by applying channel-specific, time-varying scaling factors to the audio data that corresponds to the coupling channel. Some examples are provided below.
[00288] In this example, block 804 involves determining the audio characteristics of the audio data. Here, the audio characteristics include spatial parameter data. Spatial parameter data can include alphas, the correlation coefficients between individual audio channels and the coupling channel. Block 804 may involve receiving spatial parameter data, for example, via the decorrelation information 240 described above with reference to Figures 2A and following. Alternatively, or additionally, block 804 may involve estimating spatial parameters locally, for example, by control information receiver/generator 640 (see, for example, Figure 6B or 6C ). In some implementations, block 804 may involve determining other audio characteristics, such as transient characteristics or pitch characteristics.
[00289] Here, block 806 involves determining at least two decorrelation filtering processes for the audio data based at least in part on the audio characteristics. Decorrelation filtering processes can be channel-specific decorrelation filtering processes. According to some implementations, each of the decorrelation filtering processes determined in block 806 includes a sequence of operations related to the decorrelation.
[00290] Applying at least two decorrelation filtering processes determined in block 806 may produce channel-specific decorrelation signals. For example, applying the decorrelation filtering procedures determined in block 806 can cause specific interdecorrelation ("IDC") signal consistency between channel-specific decorrelation signals for at least one pair of channels. Some of these decorrelation filtering processes may involve applying at least one decorrelation filter to at least a portion of the audio data (e.g., as described below with reference to block 820 of Figure 8B or Figure 8E) to produce filtered audio data. , also referred to in this document as decorrelation signals. Additional operations can be performed on the filtered audio data to produce the channel-specific decorrelation signals. Some of these decorrelation filtering processes may involve a sideways signal shifting process, such as one of the sideways signal shifting processes described below with reference to Figures 8B to 8D.
[00291] In some implementations, it can be determined in block 806 that the same decorrelation filter will be used to produce filtered audio data that corresponds to all channels that will be decorrelated, while in other implementations, it can be determined in block 806 that a different decorrelation filter will be used to produce filtered audio data for at least some channels that will be decorrelated. In some implementations, it may be determined at block 806 that audio data corresponding to a center channel will not be decorrelated, whereas in other implementations, block 806 may involve determining a different decorrelation filter for audio data from a center channel. . Furthermore, while in some implementations each of the decorrelation filtering processes determined in block 806 includes a sequence of operations related to the decorrelation, in alternative implementations each of the decorrelation filtering processes determined in block 806 may correspond to a particular stage of decorrelation. a process of global decorrelation. For example, in alternative implementations each of the decorrelation filtering processes determined in block 806 may correspond to a particular operation (or a group of related operations) within a sequence of operations related to generating a decorrelation signal for at least two channels.
[00292] In block 808, the decorrelation filtering processes determined in block 806 will be implemented. For example, block 808 may involve applying a decorrelation filter or filters to at least a portion of the received audio data to produce filtered audio data. The filtered audio data may, for example, correspond to the decorrelation signals 227 produced by the decorrelation signal generator 218, as described above with reference to Figures 2F, 4 and/or 6A to 6C. Block 808 may also involve various other operations, examples of which will be provided below.
[00293] Here, block 810 involves determining mixing parameters based, at least in part, on audio characteristics. Block 810 may be realized, at least in part, by mixer control module 660 of control information receiver/generator 640 (see Figure 6C). In some deployments, the mix parameters can be output channel-specific mix parameters. For example, block 810 may involve receiving or estimating alpha values for each of the audio channels that will be decorrelated, and determining mix parameters based, at least in part, on the alphas. In some deployments, alphas can be modified according to transient control information, which can be determined by the 655 transient control module (see Figure 6C). At block 812, the filtered audio data can be mixed with a direct portion of the audio data according to the mixing parameters.
[00294] Figure 8B is a flowchart illustrating blocks of a lateral signal change method. In some implementations, the blocks shown in Figure 8B are examples of the "determination" block 806 and the "application" block 808 of Figure 8A. Consequently, these blocks are labeled "806a" and "808a" in Figure 8B. In this example, block 806a involves determining decorrelation and polarity filters for decorrelation signals for at least two adjacent channels to cause a specific IDC between decorrelation signals for the channel pair. In that implementation, block 820 involves applying one or more of the decorrelation filters determined in block 806a to at least a portion of the received audio data to produce filtered audio data. The filtered audio data may, for example, correspond to the decorrelation signals 227 produced by the decorrelation signal generator 218, as described above with reference to Figures 2E and 4.
[00295] In some four channel examples, block 820 may involve applying a first decorrelation filter to the audio data for a first and second channel to produce first channel filtered data and second channel filtered data, and applying a second filter decorrelation to the audio data for a third and fourth channel to produce third channel filtered data and fourth channel filtered data. For example, the first channel might be a left channel, the second channel might be a right channel, the third channel might be a left surround channel, and the fourth channel might be a right surround channel.
[00296] Decorrelation filters can be applied before or after the audio data undergoes channel widening, depending on the particular deployment. In some deployments, for example, a decorrelation filter may be applied to a coupling channel of the audio data. Subsequently, an appropriate scaling factor for each channel can be applied. Some examples are described below with reference to Figure 8C.
[00297] Figures 8C and 8D are block diagrams that illustrate components that can be used to implement some signal switching methods. Referring first to Figure 8B, in this implementation a decorrelation filter is applied to an input audio data coupling channel at block 820. In the example shown in Figure 8C, the decorrelation signal generator control information 625 and the audio data 210, including frequency domain representations that correspond to the coupling channel, is received by decorrelation signal generator 218. In this example, decorrelation signal generator 218 outputs decorrelation signals 227 that are the same for all the channels that will be decorrelated.
[00298] The process 808a of Figure 8B may involve performing operations on the filtered audio data to produce decorrelation signals that have a specific IDC interdecorrelation signal consistency between decorrelation signals for at least one pair of channels. In that implementation, block 825 involves applying a polarity to the filtered audio data produced at block 820. In this example, the polarity applied at block 820 was determined at block 806a. In some implementations, block 825 involves reversing a polarity between audio data filtered to adjacent channels. For example, block 825 may involve multiplying filtered audio data that corresponds to a left channel or a right channel by -1. Block 825 may involve reversing a polarity of filtered audio data that corresponds to a left surround channel with reference to filtered audio data that corresponds to the left channel. Block 825 may also involve reversing a polarity of filtered audio data that corresponds to a right surround channel with reference to filtered audio data that corresponds to the right channel. In the four channel example described above, block 825 may involve reversing a polarity of the first channel filtered data relative to the second channel filtered data and reversing a polarity of the third channel filtered data relative to the fourth channel filtered data.
[00299] In the example shown in Figure 8C, decorrelation signals 227, which are also denoted as y, are received by reverse polarity module 840. Reverse polarity module 840 is configured to reverse polarity of signals decorrelation for adjacent channels. In this example, the polarity invert module 840 is configured to invert the polarity of decorrelation signals for the right channel and the left surround channel. However, in other deployments, the 840 Polarity Reverse Module can be configured to reverse the polarity of decorrelation signals for other channels. For example, the polarity invert module 840 can be configured to invert the polarity of decorrelation signals for the left channel and the right surround channel. Other deployments may involve reversing the polarity of decorrelation signals to still other channels, depending on the number of channels involved and their spatial relationships.
[00300] Reverse polarity module 840 supplies the decorrelation signals 227, which include the shifted decorrelation signals 227, to the channel-specific mixers 215a through 215d. Channel specific mixers 215a through 215d also receive the unfiltered, direct 210 audio data from the coupling channel and output channel specific spatial parameter information 630a through 630d. Alternatively, or additionally, in some implementations the channel-specific mixers 215a through 215d may receive the modified mixing coefficients 890 which are described below with reference to Figure 8F. In this example, output channel-specific spatial parameter information 630a to 630d has been modified according to transient data, for example, according to input from a transient control module such as the one depicted in Figure 6C. Examples of modifying spatial parameters according to transient data are provided below.
[00301] In this deployment, channel-specific mixers 215a through 215d mix the decorrelation signals 227 with the direct audio data 210 of the coupling channel according to the output channel specific spatial parameter information 630a through 630d and output the resulting output channel-specific mixed audio data 845a to 845d to gain control modules 850a to 850d. In this example, gain control modules 850a through 850d are configured to apply output channel specific gains, also referred to herein as scaling factors, to output channel specific mixed audio data 845a through 845d.
[00302] An alternative signal change method will now be described with reference to Figure 8D. In this example, channel-specific decorrelation filters, based at least in part on the channel-specific decorrelation control information 847a through 847d, are applied by decorrelation signal generators 218a through 218d to audio data 210a through 847d. 210d. In some deployments, decorrelation signal generator control information 847a to 847d may be received in a bit stream along with the audio data, while in other deployments decorrelation signal generator control information 847a to 847d can be generated locally (at least in part), for example, by the decorrelation filter control module 405. Here, the decorrelation signal generators 218a to 218d can also generate the channel-specific decorrelation filters according to decorrelation filter coefficient information received from the decorrelation filter control module 405. In some deployments a single filter description may be generated by the decorrelation filter control module 405, which is shared by all channels.
[00303] In this example, a channel-specific gain/scaling factor was applied to audio data 210a to 210d before* audio data 210a to 210d was received by decorrelation signal generators 218a to 218d. For example, if the audio data has been encoded according to AC-3 or E-AC-3 audio codecs, the scaling factors could be coupling coordinates or "cplcoords" which are encoded with the rest of the audio data. audio and received in a bit stream by an audio processing system such as a decoding device. In some deployments, cplcoords can also be the basis for output channel-specific scaling factors applied by gain control modules 850a to 850d to output channel-specific mixed audio data 845a to 845d (see Figure 8C).
[00304] Consequently, decorrelation signal generators 218a to 218d output specific channel decorrelation signals 227a to 227d for all channels to be decorrelated. Decorrelation signals 227a to 227d are also referred to as yL, yR, yLS, and yRS, respectively, in Figure 8D.
[00305] Decorrelation signals 227a through 227d are received by the polarity inversion module 840. The polarity inversion module 840 is configured to reverse the polarity of decorrelation signals for adjacent channels. In this example, the polarity invert module 840 is configured to invert the polarity of decorrelation signals for the right channel and the left surround channel. However, in other deployments, the 840 Polarity Reverse Module can be configured to reverse the polarity of decorrelation signals for other channels. For example, the polarity invert module 840 can be configured to invert the polarity of decorrelation signals for the left and right surround channels. Other deployments may involve reversing the polarity of decorrelation signals to still other channels, depending on the number of channels involved and their spatial relationships.
[00306] Reverse polarity module 840 provides decorrelation signals 227a through 227d, including shifted decorrelation signals 227b and 227c, for channel-specific mixers 215a through 215d. Here, channel-specific mixers 215a to 215d also receive direct audio data 210a to 210d and output channel-specific spatial parameter information 630a to 630d. In this example, the output channel-specific spatial parameter information 630a to 630d has been modified according to transient data.
[00307] In this deployment, channel specific mixers 215a to 215d mix the decorrelation signals 227 to the direct audio data 210a to 210d according to the output channel specific spatial parameter information 630a to 630d and output the audio data output channel specific mixes 845a to 845d.
[00308] Alternative methods for restoring the spatial relationship between discrete input channels are provided in this document. The methods may involve systematically determining synthesis coefficients to determine how decorrelation or reverberation signals will be synthesized. According to some of these methods, ideal IDCs are determined from target alphas and ICCs. These methods may involve systematically synthesizing a set of channel-specific decorrelation signals according to the IDCs that are determined to be optimal.
[00309] An overview of some of these systematic methods will now be described with reference to Figures 8E and 8F. Additional details, including the underlying mathematical formulas for some examples, will be described later.
[00310] Figure 8E is a flowchart illustrating blocks of a method of determining synthesis coefficients and mixing coefficients from spatial parameter data. Figure 8F is a block diagram showing examples of mixer components. In this example, method 851 starts after blocks 802 and 804 of Figure 8A. Accordingly, the blocks shown in Figure 8E can be considered additional examples of the "determination" block 806 and the "application" block 808 of Figure 8A. Therefore, blocks 855 to 865 of Figure 8E are labeled "806b" and blocks 820 and 870 are labeled "808b".
[00311] However, in this example, the decorrelation processes determined at block 806 may involve performing operations on the filtered audio data according to synth coefficients. Some examples are provided below.
[00312] Optional block 855 may involve converting from a spatial parameters form to an equivalent representation. Referring to Figure 8F, for example, the downmix and mix coefficient generation module 880 may receive spatial parameter information 630b, which includes information describing spatial relationships between N input channels, or a subset of these spatial relationships. Module 880 may be configured to convert at least some of the spatial parameter information 630b from a spatial parameter form to an equivalent representation. For example, alphas can be converted to ICCs or vice versa.
[00313] In alternative audio processing system deployments, at least some of the functionality of the 880 synth and mix coefficient generation module may be performed by elements other than the 215 mixer. For example, in some alternative deployments, at least at least some of the functionality of the downmix and mix coefficient generation module 880 can be performed by a control information receiver/generator 640 such as the one shown in Figure 6C and described above.
[00314] In this implementation, block 860 involves determining a desired spatial relationship between output channels in terms of a spatial parameter representation. As shown in Figure 8F, in some deployments the mix and synth coefficient generation module 880 may receive channel down/channel boost information 635, which may include information that corresponds to the mix information 266 received by the channel booster. N-to-M channel reducer 262 and/or the mix information 268 received by the M to K channel extender/channel reducer 264 of Figure 2E. The downmix and mix coefficient generation module 880 may also receive spatial parameter information 630a, which includes information describing spatial relationships between K output channels, or a subset of these spatial relationships. As described above with reference to Figure 2E, the number of input channels may or may not equal the number of output channels. Module 880 can be configured to calculate a desired spatial relationship (eg, an ICC) between at least some pairs of the K output channels.
[00315] In this example, block 865 involves determining rollup coefficients based on desired spatial relationships. Mixing coefficients can also be determined, based at least in part on desired spatial relationships. Referring again to Figure 8F, at block 865 the downmix and mix coefficient generation module 880 can determine the decorrelation signal synthesis parameters 615 in accordance with the desired spatial relationships between output channels. The synth and mix coefficient generation module 880 can also determine the mix coefficients 620 according to the desired spatial relationships between the output channels.
[00316] The 880 mix and synthesis coefficient generation module can supply the 615 decorrelation signal synthesis parameters to the 605 synthesizer. In some implementations, the 615 decorrelation signal synthesis parameters may be channel specific. about to leave. In that example, synthesizer 605 also receives decorrelation signals 227, which can be produced by a decorrelation signal generator 218 such as the one shown in Figure 6A.
[00317] In this example, block 820 involves applying one or more decorrelation filters to at least a portion of the received audio data to produce filtered audio data. The filtered audio data may, for example, correspond to the decorrelation signals 227 produced by the decorrelation signal generator 218, as described above with reference to Figures 2E and 4.
[00318] Block 870 may involve synthesizing decorrelation signals according to the synthesizing coefficients. In some implementations, block 870 may involve synthesizing decorrelation signals by performing operations on the filtered audio data produced in block 820. As such, the synthesized decorrelation signals may be considered a modified version of the filtered audio data. In the example shown in Figure 8F, the synthesizer 605 can be configured to perform operations on the decorrelation signals 227 in accordance with the decorrelation signal synthesis parameters 615 and to output the synthesized decorrelation signals 886 to the direct signal and signal mixer. decorrelation signals 610. Here, the synthesized decorrelation signals 886 are channel-specific synthesized decorrelation signals. In some of these implementations, block 870 may involve multiplying channel-specific synthesized decorrelation signals with appropriate scaling factors for each channel to produce scaled channel-specific synthesized decorrelation signals 886. In this example, synthesizer 605 makes linear combinations of the signals. decorrelation signal 227 according to decorrelation signal synthesis parameters 615.
[00319] The 880 mix and synth coefficient generation module can supply the 620 mix coefficients to an 888 transient mixer control module. In this implementation, the 620 mix coefficients are output channel-specific mix coefficients. . The 888 mixer transient control module can receive 430 transient control information. The 430 transient control information can be received along with the audio data or can be determined locally, for example, by a transient control module such as the control 655 shown in Figure 6C. The 888 mixer transient control module can output 890 modified mix coefficients, based at least in part on 430 transient control information, and can provide 890 modified mix coefficients for the direct signal mixer and decorrelation signal. 610.
[00320] The direct signal and decorrelation signal mixer 610 can mix the synthesized decorrelation signals 886 to the unfiltered, direct audio data 220. In this example, the audio data 220 includes audio data elements that correspond to N channels input. The direct signal and decorrelation signal mixer 610 mixes the audio data elements and the channel-specific synthesized decorrelation signals 886 on a specific output channel basis and outputs 230 uncorrelated audio data to N or M output channels, depending on the particular deployment (see, for example, Figure 2E and the corresponding description).
[00321] The following are detailed examples of some of the 851 method processes. Although these methods are described, at least in part, with reference to the capabilities of the AC-3 and E-AC-3 audio codecs, the methods have wide applicability. for many other audio codecs.
[00322] The goal of some of these methods is to reproduce all ICCs (or a selected set of ICCs) accurately, in order to restore the spatial characteristics of the audio data source that may have been lost due to channel coupling. The functionality of a mixer can be formulated as:

[00323] In Equation 1, x represents a coupling channel signal, αi represents the alpha spatial parameter for channel I, gi represents the "cplcoord" (which corresponds to a scaling factor) for channel I, yi represents the uncorrelated signal and Di(x) represents the decorrelation signal generated from the Di decorrelation filter. It is desirable for the output of the decorrelation filter to have the same spectral power distribution as the input audio data, but not be correlated to the input audio data. According to AC-3 and E-AC-3 audio codecs, cplcoords and alphas are per channel coupling frequency band, while signals and filter are per frequency range. Also, the signal samples correspond to the coefficient blocks of the filter bank. These time and frequency indices are omitted here for the sake of simplicity.
[00324] Alpha values represent the correlation between discrete channels of the audio data source and the coupling channel, which can be expressed as follows:

[00325] In Equation 2, E represents the expected value of the term(s) inside the square brackets, x* represents the complex conjugate of x and si represents a discrete signal for channel I.
[00326] The interchannel consistency or ICC between a pair of uncorrelated signals can be derived as follows:

[00327] In Equation 3, IDCi1ii2 represents the consistency of the interdecorrelation signal ("IDC") between Di1(x) and Di2(x). With fixed alphas, the ICC is maximized when the IDC is +1 and minimized when the IDC is -1. When the ICC of the audio data source is known, the ideal IDC required to replicate it can be resolved as:

[00328] The ICC between the uncorrelated signals can be controlled by selecting uncorrelated signals that satisfy the ideal IDC conditions of Equation 4. Some methods for generating these uncorrelated signals will be discussed below. Prior to that discussion, it may be useful to describe the relationships between some of these spatial parameters, particularly those between ICCs and alphas.
[00329] As noted above with reference to optional block 855 of method 851, some implementations provided in this document may involve converting from a spatial parameters form to an equivalent representation. In some of these deployments, optional block 855 may involve converting from alphas to ICCs or vice versa. For example, alphas can be singularly determined if both cplcoords (the comparable scaling factors) and ICCs are known.
[00330] A coupling channel can be generated as follows:

[00331] In Equation 5, si represents the discrete signal for channel i involved in coupling and gx represents an arbitrary gain adjustment applied to x. Replacing the term x in Equation 2 with the equivalent expression in Equation 5, an alpha for channel i can be expressed as follows:

[00332] The power of each discrete channel can be represented by the power of the coupling channel and the power of the corresponding cplcoord as follows:

[00333] Cross-correlation terms can be replaced as follows:

[00334] Therefore, alphas can be expressed this way:

[00335] Based on Equation 5, the power of x can be expressed as follows:

[00336] Therefore, the gain adjustment gx can be expressed as follows:

[00337] Consequently, if all cplcoords and ICCs are known, the alphas can be computed according to the following expression:

[00338] As noted above, the ICC between uncorrelated signals can be controlled by selecting uncorrelated signals that satisfy Equation 4. In the stereo case, a single uncorrelated filter can be formed that generates uncorrelated uncorrelated signals for the coupling channel signal. The ideal IDC of -1 can be achieved simply by switching signs, for example, according to one of the sign-changing methods described above.
[00339] However, the task of controlling ICCs for multichannel cases is more complex. In addition to ensuring that all decorrelation signals are substantially uncorrelated to the coupling channel, the IDCs between the decorrelation signals must also satisfy Equation 4.
[00340] In order to generate decorrelation signals with the desired IDCs, a set of mutually uncorrelated "seed" decorrelation signals can be generated first. For example, decorrelation signals 227 can be generated according to methods described elsewhere herein. Subsequently, the desired decorrelation signals can be synthesized by linearly combining these seeds with appropriate weights. An overview of some examples is described above with reference to Figures 8E and 8F.
[00341] It can be challenging to generate many high quality, mutually uncorrelated (eg, orthogonal) decorrelation signals from a channel reduction. Also, calculating the appropriate combination weights can involve matrix inversion, which can pose challenges in terms of complexity and stability.
[00342] Consequently, in some examples provided in this document, a "anchor and expand" process may be implemented. In some deployments, some IDCs (and ICCs) may be more significant than others. For example, lateral ICCs may be perceptually more important than diagonal ICCs. In a Dolby 5.1 channel example, the ICCs for the L-R, L-Ls, R-Rs, and Ls-Rs channel pairs may be perceptually more important than the ICCs for the L-Rs and R-L channel pairs. Front channels can be perceptually more important than rear or surround channels.
[00343] In some of these deployments, the Equation 4 terms for the most important IDC can be satisfied by first combining two orthogonal decorrelation signals (seeds) to synthesize the decorrelation signals for the two channels involved. Then, with the use of these synthesized decorrelation signals as anchors and addition of new seeds, the terms of Equation 4 for the secondary IDCs can be satisfied and the corresponding decorrelation signals can be synthesized. This process can be repeated until the terms of Equation 4 are satisfied for all IDCs. These deployments allow the use of higher quality decorrelation signals to control relatively more critical ICCs.
[00344] Figure 9 is a flowchart that highlights a process of synthesizing decorrelation signals in multichannel cases. Method blocks 900 can be considered as further examples of the "determination" process of block 806 of Figure 8A and the "application" process of block 808 of Figure 8A. Accordingly, in Figure 9 blocks 905 to 915 are labeled "806c" and blocks 920 and 925 of method 900 are labeled "808c". Method 900 provides an example in a 5.1 channel context. However, the 900 method has wide applicability to other contexts.
[00345] In this example, blocks 905 to 915 involve calculating synthesis parameters to be applied to a set of mutually uncorrelated seed decorrelation signals, Dni(x), that are generated in block 920. In some 5.1 channel deployments , i ={1, 2, 3, 4}. If the central channel will be uncorrelated, a fifth seed uncorrelated signal may be involved. In some deployments, uncorrelated (orthogonal), Dni(x) decorrelation signals can be generated by inputting the mono channel reduction signal into several different decorrelation filters. Alternatively, the initial channel widening signals can each be fed into a unique decorrelation filter. Several examples are provided below.
[00346] As noted above, front channels can be perceptually more important than rear or surround channels. Therefore, in method 900, the decorrelation signals for the L and R channels are jointly anchored in the first two seeds, then the decorrelation signals for the Ls and Rs channels are synthesized using these anchors and the remaining seeds.
[00347] In this example, block 905 involves calculating synthesis parameters p and pr for the front L and R channels. Here, p and pr are derived from the IDC of LR as:

[00348] Therefore, block 905 also involves calculating the L-R CDI from Equation 4. Consequently, in this example, ICC information is used to calculate the L-R CDI. Other method processes can also use ICC values as input. The ICC values can be obtained from the encoded bitstream or by estimation on the decoder side, e.g. based on uncoupled lower or higher frequency bands, cplcoords, alphas, etc.
[00349] The synthesis parameters p and pr can be used to synthesize the decorrelation signals for the L and R channels in block 925. The decorrelation signals for the Ls and Rs channels can be synthesized using the decorrelation signals for the Ls and Rs channels. L and R channels as anchors.
[00350] In some deployments, it may be desirable to control the ICC of Ls-Rs. According to method 900, synthesizing intermediate decorrelation signals D'Ls(x) and D'Rs(x) with two of the seed decorrelation signals involves calculating the synthesis parameters α and αr. Therefore, optional block 910 involves calculating the synthesis parameters α and αr for the surround channels. It can be derived that the required correlation coefficient between intermediate decorrelation signals D'Ls(x) and D'Rs(x) can be expressed as follows:

[00351] The variables a and ar can be derived from their correlation coefficient:

[00352] Therefore, D'Ls(x) and D'Rs(x) can be defined as:

[00353] However, if the ICC of Ls-Rs is not a concern, the correlation coefficient between D'Ls(x) and D'Rs(x) can be set to -1. Consequently, the two signals can simply be Signal-shifted versions of each other, constructed from the remaining seed decorrelation signals.
[00354] The central channel may or may not be uncorrelated depending on the particular deployment. Accordingly, the process of calculating synthesis parameters t1 and t2 for the center channel of block 915 is optional. The synthesis parameters for the center channel can be calculated, for example, if it is desirable to control the LC and RC ICCs. If so, a fifth seed, Dn5(x) can be added and the decorrelation signal for the C channel can be expressed as follows:

[00355] In order to achieve the desired LC and RC ICCs, Equation 4 must be satisfied for the LC and RC IDCs:

[00356] Asterisks indicate conjugated complexes. Consequently, the synthesis parameters t1 and t2 for the central channel can be expressed as follows:

[00357] At block 920, a set of mutually uncorrelated seed decorrelation signals, Dni(x), i ={1, 2, 3, 4}, can be generated. If the center channel is to be decorrelated, a fifth seed decorrelation signal can be generated at block 920. These uncorrelated (orthogonal) decorrelation signals, Dni(x) can be generated by inputting the mono channel reduction signal into several different decorrelation filters.
[00358] In this example, block 925 involves applying the terms derived above to synthesize the decorrelation signals, as follows:

[00359] In this example, the equations for synthesizing decorrelation signals for the Ls and Rs channels (DLs(x) and DRs(x)) are dependent on the equations for synthesizing the decorrelation signals for the L and R channels (DL (x) and DR(x)). In method 900, the decorrelation signals for the L and R channels are anchored together to mitigate potential left-right drift due to imperfect decorrelation signals.
[00360] In the example above, seed decorrelation signals are generated from the mono x channel reduction signal in block 920. Alternatively, seed decorrelation signals can be generated by inputting each signal subjected to channel widening in a single decorrelation filter. In this case, the generated seed decorrelation signals would be channel-specific: Dni(gix), i ={L, R, Ls, Rs, C}. These seed-specific channel decorrelation signals would, in general, have different power levels due to the amplification of the channeling process. Consequently, it is desirable to align the power level between these seeds when combining them. To achieve this, the synthesis equations for block 925 can be modified as follows:

[00361] In the modified rollup equations, all rollover parameters remain the same. However, level adjustment parameters λi,j are required to align the power level when using a seed decorrelation signal generated from channel j to synthesize the decorrelation signal for channel i. These specific channel pair level adjustment parameters can be computed based on estimated channel level differences such as:

[00362] Also, since channel-specific scaling factors are already incorporated into the decorrelation signals synthesized in this case, the mixer equation for block 812 (Figure 8A) should be modified from Equation 1 as:

[00363] As noted elsewhere in this document, in some deployments spatial parameters may be received along with audio data. Spatial parameters may, for example, have been encoded with the audio data. Spatial parameters and encoded audio data may be received in a bit stream by an audio processing system such as a decoder, for example, as described above with reference to Figure 2D. In that example, spatial parameters are received by the decorrelator 205 through explicit decorrelation information 240.
[00364] However, in alternative implementations, no coded spatial parameters (or an incomplete set of spatial parameters) are received by the decorrelator 205. According to some of these implementations, the control information receiver/generator 640, described above with reference to the Figures 6B and 6C (or another element of an audio processing system 200), may be configured to estimate spatial parameters based on one or more attributes of the audio data. In some implementations, the control information receiver/generator 640 may include a spatial parameter module 665 that is configured for spatial parameter estimation and related functionality described herein. For example, spatial parameter module 665 can estimate spatial parameters for frequencies in a coupling channel frequency range based on audio data characteristics outside the coupling channel frequency range. Some of these deployments will now be described with reference to Figures 10A et seq.
[00365] Figure 10A is a flow diagram that provides an overview of a method for estimating spatial parameters. At block 1005, audio data including a first set of frequency coefficients and a second set of frequency coefficients are received by one set. For example, the first and second sets of frequency coefficients can be the result of applying a modified discrete sine transform, a modified discrete cosine transform, or an orthogonal transform superimposed on audio data in a time domain. In some deployments, the audio data may have been encoded using a legacy encoding process. For example, the legacy encoding process can be an AC-3 audio codec process or an AC-3 enhanced audio codec. Thus, in some deployments, the first and second sets of frequency coefficients may be real-valued frequency coefficients. However, Method 1000 is not limited in its application to these codecs, but is broadly applicable to many audio codecs.
[00366] The first set of frequency coefficients may correspond to a first frequency range and the second set of frequency coefficients may correspond to a second frequency range. For example, the first frequency range may correspond to an individual channel frequency range and the second frequency range may correspond to a received coupling channel frequency range. In some deployments, the first frequency band may be below the second frequency band. However, in alternative deployments, the first frequency range may be above the second frequency range.
[00367] Referring to Figure 2D, in some deployments, the first set of frequency coefficients may correspond to audio data 245a or 245b, which includes frequency domain representations of audio data outside a channel frequency range of coupling. The audio data 245a and 245b are not decorrelated in this example, but can nevertheless be used as input to spatial parameter estimations performed by the decorrelator 205. The second set of frequency coefficients can correspond to the audio data 210. or 220, which includes frequency domain representations that correspond to a coupling channel. However, unlike the example in Figure 2D, method 1000 may not involve receiving spatial parameter data along with the frequency coefficients for the coupling channel.
[00368] At block 1010 the spatial parameters for at least part of the second set of frequency coefficients are estimated. In some implementations, the estimation is based on one or more aspects of estimation theory. For example, the estimation process may be based, at least in part, on a maximum chance method, a Bayes estimator, a moment estimator method, a least mean square error estimator, and/or an unbiased variance estimator. minimum.
[00369] Some such deployments may involve estimating the union probability density functions ("PDFs") of the lower frequency and upper frequency spatial parameters. For example, suppose there are two L and R channels, and in each channel there is a low band in the individual channel frequency range and a high band in the coupling channel frequency range. One can thus have an ICC_lo which represents the interchannel consistency between L and R channels in the individual channel frequency range, and an ICC_hi which exists in the coupling channel frequency range.
[00370] If there is a large training set of audio signals, they can be segmented and for each segment ICC_lo and ICC_hi can be calculated. In this way, one can have a large training set of ICC pairs (ICC_lo, ICC_hi). A merged PDF of this pair of parameters can be calculated as histograms and/or modeled using parametric models (eg Gaussian Mixture Models). This model may be a time-invariant model that is known to the decoder. Alternatively, template parameters can be regularly sent to the decoder via the bitstream.
[00371] At the decoder, ICC_lo for a particular segment of received audio data can be calculated, for example, according to how cross correlation coefficients between individual channels and the composite coupling channel are calculated as described herein. Given this ICC_lo value and the attached PDF model of parameters, the decoder can try to estimate which ICC_hi is. Such an estimate is the Maximum Chance ("ML") estimate, where the decoder can calculate the conditional PDF of ICC_hi given the value of ICC_lo. This conditional PDF is now essentially a positive real-value function that can be represented on an x-y axis, with the x-axis representing the continuity of ICC-hi values and the y-axis representing the conditional probability of each such value. The ML estimate may involve choosing as the ICC_hi estimate that value at which this function peaks. On the other hand, the least mean square error ("MMSE") estimate is the mean of this conditional PDF, which is another valid estimate of ICC_hi. Estimation theory provides many such tools to propose an estimate of ICC_hi.
[00372] The example of two parameters above is a very simple case. In some deployments, there may be a greater number of channels as well as bands. Spatial parameters can be alphas or ICCs. Furthermore, the PDF template can be conditioned on signal type. For example, there may be a different model for transients, a different model for tonal signals, etc.
[00373] In this example, the estimation of block 1010 is based, at least in part, on the first set of frequency coefficients. For example, the first set of frequency coefficients may include audio data for two or more individual channels in a first frequency range that is outside a received coupling channel frequency range. The estimation process may involve calculating combined frequency coefficients of a composite coupling channel within the first frequency range, based on the frequency coefficients of the two or more channels. The estimation process may also involve computing cross correlation coefficients between the combined frequency coefficients and frequency coefficients of the individual channels within the first frequency range. The results of the estimation process may vary according to temporal changes of input audio signals.
[00374] In block 1015, the estimated spatial parameters can be applied to the second set of frequency coefficients to generate a second modified set of frequency coefficients. In some deployments, the process of applying the spatial parameters estimated in the second set of frequency coefficients may be part of a decorrelation process. The decorrelation process may involve generating a reverberation signal or a decorrelation signal and applying the same to the second set of frequency coefficients. In some deployments, the decorrelation process may involve applying a decorrelation algorithm that operates entirely on real-value coefficients. The decorrelation process may involve adaptive or selective signal decorrelation of specific channels and/or specific frequency bands.
[00375] A more detailed example will now be described with reference to Figure 10B. Figure 10B is a flow diagram that provides an overview of an alternative method for estimating spatial parameters. Method 1020 may be performed by an audio processing system, such as a decoder. For example, method 1020 may be performed, at least in part, by a control information receiver/generator 640 such as the one illustrated in Figure 6C.
[00376] In this example, the first set of frequency coefficients is in an individual channel frequency range. The second set of frequency coefficients corresponds to a coupling channel that is received by an audio processing system. The second set of frequency coefficients is in a received coupling channel frequency range, which is above the individual channel frequency range in this example.
[00377] Thus, block 1022 involves receiving audio data for the individual channels and for the received coupling channel. In some deployments, the audio data may have been encoded using a legacy encoding process. Applying spatial parameters that are estimated according to method 1000 or method 1020 to received coupling channel audio data can provide more spatially accurate audio reproduction than that obtained by decoding the received audio data according to a legacy decoding process that corresponds to the legacy encoding process. In some deployments, the legacy encoding process may be an AC-3 audio codec process or an AC-3 enhanced audio codec process. Thus, in some implementations, block 1022 may involve receiving frequency coefficients of real value, but not frequency coefficients that have imaginary values. However, the 1020 method is not limited to these codecs, but is broadly applicable to many audio codecs.
[00378] In block 1025 of method 1020, at least a portion of the individual channel frequency band is divided into a plurality of frequency bands. For example, the individual channel frequency range can be divided into 2, 3, 4 or more frequency bands. In some implementations, each of the frequency bands may include a predetermined number of consecutive frequency coefficients, for example, 6, 8, 10, 12 or more consecutive frequency coefficients. In some deployments, only part of the individual channel's frequency range can be divided into frequency bands. For example, some deployments may involve only splitting a higher frequency portion of the individual channel frequency range (relatively closer to the received coupled channel frequency range) into frequency bands. According to some examples based on E-AC-3, a higher frequency portion of the individual channel frequency range can be divided into 2 or 3 bands, each of which includes 12 MDCT coefficients. According to some such implementations, only that portion of the individual channel frequency range that is above 1 kHz, above 1.5 kHz, etc. can be divided into frequency bands.
[00379] In this example, block 1030 involves computing the energy in the individual channel frequency bands. In this example, if an individual channel has been excluded from coupling, then the in-band power of the excluded channel will not be computed in block 1030. In some implementations, the power values computed in block 1030 may be flattened.
[00380] In this implementation, a composite coupling channel, based on the audio data of the individual channels in the individual channel frequency range, is created in block 1035. Block 1035 may involve calculating frequency coefficients for the composite coupling channel , which may be referred to in this document as "combined frequency coefficients". Combined frequency coefficients can be created using the frequency coefficients of two or more channels in the individual channel frequency range. For example, if the audio data has been encoded according to the E-AC-3 codec, block 1035 may involve computing a local channel reduction of MDCT coefficients below the "coupling start frequency", which is the frequency lowest in the received coupling channel frequency range.
[00381] The power of the composite coupling channel, within each frequency band of the individual channel frequency range, can be determined in block 1040. In some implementations, the power values computed in block 1040 may be standardized.
[00382] In this example, block 1045 involves determining cross-correlation coefficients, which correspond to the correlation between frequency bands of the individual channels and corresponding frequency bands of the composite coupling channel. Here, computing cross correlation coefficients in block 1045 also involves computing the energy in the frequency bands of each of the individual channels and the energy in the corresponding frequency bands of the composite coupling channel. Cross correlation coefficients can be normalized. According to some implementations, if an individual channel has been excluded from coupling, then frequency coefficients from the excluded channel are not used in computing the cross correlation coefficients.
[00383] Block 1050 involves estimating spatial parameters for each channel that has been coupled to the received coupling channel. In this implementation, block 1050 involves estimating the spatial parameters based on the cross correlation coefficients. The estimation process may involve averaging normalized cross correlation coefficients across all individual channel frequency bands. The estimation process may also involve applying a scaling factor to the average of the normalized cross-correlation coefficients to obtain the estimated spatial parameters for individual channels that were coupled to the received coupling channel. In some deployments, the scaling factor may decrease with increasing frequency.
[00384] In this example, block 1055 involves adding noise to the estimated spatial parameters. Noise can be added to model the variance of the estimated spatial parameters. Noise can be added according to a set of rules that correspond to an expected prediction of the spatial parameter across frequency bands. Rules can be based on empirical data. Empirical data may correspond to observations and/or measurements derived from a large set of audio data samples. In some deployments, the added noise variation may be based on the estimated spatial parameter for a frequency band, a frequency band index, and/or a variance of normalized cross-correlation coefficients.
[00385] Some deployments may involve receiving or determining pitch information with respect to the first or second set of frequency coefficients. According to some such implementations, the process of block 1050 and/or 1055 can be varied according to the hue information. For example, if the control information receiver/generator 640 of Figure 6B or Figure 6C determines that the audio data in the coupling channel frequency range is highly tonal, the control information receiver/generator 640 can be configured to temporarily reduce the amount of noise added at block 1055.
[00386] In some deployments, the estimated spatial parameters may be alpha estimated for the received coupling channel frequency bands. Some such deployments may involve applying the alphas to audio data that correspond to the coupling channel, for example, as part of a de-correlated process.
[00387] More detailed examples of method 1020 will now be described. These examples are provided in the context of the E-AC-3 audio codec. However, the concepts illustrated by these examples are not limited to the context of the E-AC-3 audio codec, but rather are broadly applicable to many audio codecs.
[00388] In this example, the composite coupling channel is computed as a mixture of discrete sources:

[00389] In Equation 8, where SDi represents the line vector of a decoded MDCT transform of a specific frequency band (kstart ..kend) of channel i, with kend = KCPL, where the interval index corresponds to the E-AC-3 coupling start frequency, the lowest frequency of the received coupling channel frequency range. Here, gx represents a normalization term that does not impact the estimation process. In some deployments, gx can be set to 1.
[00390] The decision regarding the number of intervals analyzed between kstart and kend can be based on a trade-off between complexity constraints and the desired alpha estimation precision. In some deployments, kstart may match a frequency at or above a particular threshold (e.g. 1 kHz), so that audio data in a frequency range that is relatively closer to the coupling channel frequency range received are used in order to improve the estimation of alpha values. The frequency region (kstart ..kend) can be divided into frequency bands. In some deployments, the cross-correlation coefficients for these frequency bands may be computed as follows.

[00391] In Equation 9, sDi (l) represents that segment of sDique corresponds to the band l of the lower frequency band, and xD (l) represents the corresponding segment of xD . In some deployments, the E{} expectation can be approximated using a simple zero-pole infinite impulse response ("IIR") filter, for example, as follows:

[00392] In Equation 10, E{y}(n) represents the estimate of E{y} using samples up to block n . In this example, cci(l) is only computed for those channels that are coupled to the current block. For the purpose of standardizing the power estimation given only real-based MDCT coefficients, a value of a = 0.2 was found to be sufficient. For transforms other than MDCT, and specifically for complex transforms, a larger value of a can be used. In such cases, a value of a in the range of 0.2<a<0.5 would be acceptable. Some lower-complexity deployments may involve time smoothing the computed correlation coefficient cci(l) instead of the cross-correlation coefficients and powers. Although not mathematically equivalent to estimating the numerator and denominator separately, it has been found that such lower-complexity commonality provides a sufficiently accurate estimate of the cross-correlation coefficients. The particular deployment of the estimation function as a first-order IIR filter does not preclude deployment through other schemes, such as one based on a first-in-last-out (first-in-last-out) temporary storage (" PHYLUM")). In such deployments, the oldest sample in staging can be subtracted from the current estimate E{}, while the newest sample can be added to the current estimate E{}.
[00393] In some implementations, the standardization process takes into account whether for the previous block the SDi coefficients were in coupling. For example, if in the previous block, channel i was not coupling, then for the current block, a can be set to 1.0, as the MDCT coefficients for the previous block would not have been included in the coupling channel. Also, the previous MDCT transform could have been encoded using the E-AC-3 short block mode, which further validates setting a to 1.0 in this case.
[00394] At this stage, cross correlation coefficients between individual channels and a composite coupling channel were determined. In the example of Figure 10B, the processes corresponding to blocks 1022 to 1045 were performed. The following processes are examples of spatial estimation parameters based on cross-correlation coefficients. These processes are examples of block 1050 of method 1020.
[00395] In an example, using the cross correlation coefficients for the frequency bands below KCPL (the lowest frequency of the received coupling channel frequency range), an estimate of the alphas to be used for coefficient decorrelation -aware MDCT above KCPL can be generated. The pseudocode for computing the estimated alphas from the cci(l) values according to such an implementation is as follows: para (reg = 0; reg < numRegions; reg ++) { para (chan = 0; chan < numChans; chan ++){
[00396] Compute the mean ICC and variance for the current region:CCm = MeanRegion(chan, iCCs, blockStart[reg], blockEnd[reg])CCv = VarRegion(chan, iCCs, blockStart[reg], blockEnd[reg]) to (block = blockStart[reg]; block < blockEnd[reg]; block++) {
[00397] If channel is not coupled, then skip block: if (chanNotInCpl[block][chan]) continue;fAlphaRho = CCm * MAPPED_VAR_RHO;fAlphaRho = (fAlphaRho > -1.0f) fAlphaRho : -1.0f;fAlphaRho = (fAlphaRho < 1.0f) fAlphaRho: 0.99999f;para(band = cplStartBand[blockStart]; band < iBandEnd[blockStart]; band ++){iAlphaRho=floor(fAlphaRho*128)+128;fEstimatedValue = fAlphaRho + w[iNoiseIndex++] * Vb[band ] * Vm[iAlphaRho] * sqrt(CCv);fAlphaRho = fAlphaRho * MAPPED_VAR_RHO;EstAlfaAray[block][chan][band]=Smooth(fEstimatedValue);}}} end channel cycle} end region cycle
[00398] A key input to the above extrapolation process that generates alphas is CCm, which represents the average of the correlation coefficients (cci(l)) over the current region. A "region" can be an arbitrary grouping of consecutive E-AC-3 blocks. An E-AC-3 frame can consist of more than one region. However, in some deployments regions do not fit frame limitations. CCm can be computed as follows (denoted as the MeanRegion() function in the above pseudocode):

[00399] In Equation 11, i represents the channel index, L represents the number of low frequency bands (below KCPL) used for estimation, and N represents the number of blocks within the current region. Here we extend the cci(l) notation to include the block index n. The average cross correlation coefficient can then be extrapolated to the received coupling channel frequency range by repeatedly applying the following scaling operation to generate a predicted alpha value for each coupling channel frequency band: fAlphaRho = fAlphaRho * MAPPED_VAR_RHO (Equation 12)
[00400] When applying Equation 12, fAlphaRho for the first coupling channel frequency band can be CCm(i) * MAPPED _ VAR _ RHO . In the pseudocode example, the MAPPED_VAR_RHO variable was derived heuristically observing that the average alpha values tend to decrease with increasing band index. As such, MAPPED_VAR_RHO is defined to be less than 1.0. In some deployments, MAPPED_VAR_RHO is set to 0.98.
[00401] At this stage, spatial parameters (alpha in this example) have been estimated. In the example of Figure 10B, the processes corresponding to blocks 1022 to 1050 were performed. The following processes are examples of adding noise to or "dipping" the estimated spatial parameters. These processes are examples of block 1055 of method 1020.
[00402] Based on an analysis of how the prediction error varies frequently for a large corpus of different types of multichannel input signals, the inventors formulated heuristic rules that control the degree of randomization that is imposed on the estimated alpha values. The estimated spatial parameters in the coupling channel frequency range (obtained by calculating the correlation of lower frequencies followed by extrapolation) may eventually have the same statistics as if these parameters had been calculated directly in the coupling channel frequency range from the original signal, when all individual channels were available without being coupled. The purpose of adding noise is to provide a statistical variation similar to that observed empirically. In the above pseudocode, VB represents an empirically derived scaling term that dictates how the variance changes as a function of band index. VM represents an empirically derived feature that is based on prediction for alpha before synthesized variance is applied. This accounts for the fact that the forecast error variance is in fact a function of the forecast. For example, when the linear alpha prediction for a band is close to 1.0 the variance is very low. The term CCv represents a control based on the local variance of the cci values computed for the current shared block region. CCv can be computed as follows (indicated by VarRegion() in the pseudocode above):

[00403] In this example, VB controls the dither variance according to the band index. VB was derived empirically by examining the variance across bands of the alpha prediction error calculated from the source. The inventors found that the relationship between normalized variance and the l-band index can be modeled according to the following equation:

[00404] Figure 10C is a graph that indicates the relationship between scaling term VB and band index l. Figure 10C shows that the incorporation of the VB feature will lead to an estimated alpha that will have progressively greater variance as a function of band index. In Equation 13, a band index l < 3 corresponds to the region below 3.42 kHz, the lowest coupling start frequency of the E-AC-3 audio codec. Therefore, the VB values for those band indices are immaterial.
[00405] The VM parameter was derived by examining the alpha prediction error behavior as a function of the prediction itself. In particular, the inventors have found through analysis of a large corpus of multichannel content that when the predicted alpha value is negative, the prediction error variance increases, with a peak at alpha = -0.59375. This implies that when the current channel under analysis is negatively correlated to the xD channel reduction, the estimated alpha may be, in general, more chaotic. Equation 14, below,

[00406] In Equation 14, q represents the quantized version of the forecast (denoted by fAlphaRho in the pseudocode), and can be computed according to: q =floor(fAlphaRho*128)
[00407] Figure 10D is a graph that indicates the relationship between VM and q variables. Note that VM is normalized by the value at q = 0, so VM modifies the other factors that contribute to the forecast error variance. Thus, the term VM only affects the overall prediction error variance for values other than q = 0. In the pseudocode, the symbol iAlphaRho is set to q+128. This mapping avoids the need for negative iAlphaRho values and allows reading VM (q) values directly from a data structure such as a table.
[00408] In this deployment, the next step is to scale the random variable w by the three factors VM, Vb and CCv. The geometric mean between VM and CCv can be computed and applied as the scaling factor to the random variable. In some deployments, w can be deployed as a very large table of random numbers with a Gaussian distribution of zero mean unit variance.
[00409] After the scaling process, a standardization process can be applied. For example, the dotted estimated spatial parameters can be smoothed over time, for example using a FILO smoother or simple zero pole. The smoothing coefficient can be set to 1.0 if the previous block was not in coupling, or if the current block is the first block in a region of blocks. Thus, the staggered random number from the noise recording w can be low-pass filtered, which has been found to better correspond to the variance of alpha values estimated for the variance of alphas at the source. In some deployments, this smoothing process may be less aggressive (ie, IIR with a shorter impulse response) than the smoothing used for cci(l)s.
[00410] As noted above, the processes involved in estimating alphas and/or other spatial parameters may be performed, at least in part, by a control information receiver/generator 640 such as the one illustrated in Figure 6C. In some deployments, the 655 transient control module of the 640 control information receiver/generator (or one or more other components of an audio processing system) can be configured to provide transient-related functionality. Some examples of transient detection, and of controlling a decorrelation process in this way, will now be described with reference to Figures 11A et seq.
[00411] Figure 11A is a flow diagram that highlights some control methods related to transient and transient determination. At block 1105, audio data corresponding to a plurality of audio channels is received, for example, by a decoding device or other such audio processing system. As described below, in some deployments, similar processes may be performed by an encoding device.
[00412] Figure 11B is a block diagram that includes examples of various components for transient related controls and transient termination. In some implementations, block 1105 may involve receiving audio data 220 and audio data 245 by an audio processing system that includes transient control module 655. Audio data 220 and 245 may include domain representations of frequency of audio signals. Audio data 220 may include audio data elements in a coupling channel frequency range, while audio data elements 245 may include audio data outside the coupling channel frequency range. Audio data elements 220 and/or 245 can be routed to a decorrelator that includes the transient control module 655.
[00413] In addition to audio data elements 245 and 220, transient control module 655 can receive other associated audio information, such as decorrelation information 240a and 240b, in block 1105. In this example, decorrelation information 240a may include explicit decoupling-specific control information. For example, decorrelation information 240a may include explicit transient information such as those described below. The 240b decorrelation information can include information from a bitstream of a legacy audio codec. For example, decorrelation information 240b may include time segmentation information that is available in a bit stream encoded according to the AC-3 audio codec or the E-AC-3 audio codec. For example, decorrelation information 240b may include coupling-in-use information, block switching information, exponent information, exponent strategy information, etc. Such information may have been received by an audio processing system in a bit stream along with audio data 220.
[00414] Block 1110 involves determining audio characteristics of the audio data. In various implementations, block 1110 involves determining transient information, for example, by the transient control module 655. Block 1115 involves determining an amount of decorrelation for the audio data based, at least in part, on the audio characteristics. . For example, block 1115 may involve determining decorrelation control information based, at least in part, on transient information.
[00415] In block 1115, the transient control module 655 of Figure 11B can provide the decorrelation signal generator 625 control information to a decorrelation signal generator, such as the decorrelation signal generator 218 described elsewhere in this document. At block 1115, transient control module 655 can also provide mixer control information 645 to a mixer, such as mixer 215. At block 1120, audio data can be processed in accordance with the determinations made in block 1115 For example, the operations of the decorrelation signal generator 218 and the mixer 215 can be performed, at least in part, in accordance with decorrelation control information provided by the transient control module 655.
[00416] In some implementations, block 1110 of Figure 11A may involve receiving explicit transient information with the audio data and determining the transient information, at least in part, according to the explicit transient information.
[00417] In some deployments, explicit transient information may indicate a transient value that corresponds to a defined transient event. Such a transient value can be a relatively high (or maximum) transient value. A high transient value may correspond to a high chance and/or high severity of a transient event. For example, if transient values are possible in the range 0 to 1, a range of transient values between .9 and 1 may correspond to a severe and/or a definite transient event. However, any appropriate range of transient values can be used, for example, 0 to 9, 1 to 100, etc.
[00418] Explicit transient information may indicate a transient value that corresponds to a defined non-transient event. For example, if transient values in the range 1 to 100 are possible, a value in the range 1 to 5 may correspond to a definite non-transient event or a very mild transient event.
[00419] In some deployments, explicit transient information may have a binary representation, for example, either 0 or 1. For example, a value of 1 may correspond to a defined transient event. However, a value of 0 may not indicate a defined non-transient event. Instead, in some such deployments, a value of 0 may simply indicate the lack of a severe transient event and/or a definite one.
[00420] However, in some deployments, the explicit transient information may include transient values intermediate between a minimum transient value (eg 0) and a maximum transient value (eg 1). An intermediate transient value may correspond to an intermediate chance and/or an intermediate severity of a transient event.
[00421] Decorrelation filter input control module 1125 of Figure 11B can determine transient information in block 1110 according to explicit transient information received via decorrelation information 240a. Alternatively or additionally, the decorrelation filter input control module 1125 may determine transient information in block 1110 according to information from a bitstream of a legacy audio codec. For example, based on the decorrelation information 240b, the decorrelation filter input control module 1125 can determine that channel coupling is not in use for the current block, that the channel is out of coupling in the current block, and/or or that the channel is block switched in the current block.
[00422] Based on decorrelation information 240a and/or 240b, the decorrelation filter input control module 1125 can sometimes determine a transient value that corresponds to a transient event defined in block 1110. In some implementations the decorrelation filter input control module 1125 may determine at block 1115 that a decorrelation process (and/or a decorrelation filter dithering process) should be temporarily stopped. Thus, at block 1120 the decorrelation filter input control module 1125 may generate decorrelation signal generator control information 625e indicating that a decorrelation process (and/or a decorrelation filter dithering process) is to be temporarily interrupted. Alternatively or additionally, at block 1120 the soft transient calculator 1130 may generate control information from decorrelation signal generator 625f, indicating that a decorrelation filter dithering process is to be temporarily stopped or reduced.
[00423] In alternate implementations, block 1110 may involve not receiving explicit transient information with the audio data. However, whether or not explicit transient information is received, some implementations of method 1100 may involve detecting a transient event according to an analysis of the 220 audio data. For example, in some implementations, a transient event may be detected at block 1110 even when explicit transient information does not indicate a transient event. A transient event that is determined or detected by a decoder, or similar audio processing system, in accordance with an analysis of the audio data 220 may be referred to herein as a "soft transient event".
[00424] In some implementations, if a transient value is provided as an explicit transient value or determined as a soft transient value, the transient value may be subjected to an exponential decay function. For example, the exponential decay function can cause the transient value to decay uniformly from an initial value to zero over a period of time. Submitting a transient value to an exponential decay function can avoid artifacts associated with abrupt switching.
[00425] In some deployments, detecting a soft transient event may involve assessing the chance and/or severity of a transient event. Such evaluations may involve calculating a temporal power variation in the audio data 220.
[00426] Figure 11C is a flow diagram that highlights some methods of determining transient control values based, at least in part, on temporal power variations of audio data. In some implementations, method 1150 can be performed, at least in part, by the soft transient calculator 1130 of the transient control module 655. However, in some implementations, method 1150 can be performed by an encoding device. In some such deployments, the explicit transient information may be determined by the encoding device according to the 1150 method and included in a bitstream along with other audio data.
[00427] Method 1150 starts with block 1152 where audio data subjected to channel widening in a coupling channel frequency range is received. In Figure 11B, for example, audio data elements subjected to channel widening 220 may be received by soft transient calculator 1130 at block 1152. At block 1154, the received coupling channel frequency band is divided into one or more bands. frequency bands, which may also be referred to in this document as "power bands".
[00428] Block 1156 involves computing the frequency band weighted logarithmic power ("WLP") for each channel and block of audio data subjected to channel widening. To compute the WLP, the power of each power band can be determined. These powers can be converted to logarithmic values and then averaged across the power bands. In some deployments, block 1156 can be performed as follows:

[00429] In Equation 15,
represents the weighted logarithmic power for a channel and block,
represents a frequency band or "power band" into which the received coupling channel frequency band has been divided and
represents an average of the logarithms of power across the channel and block power bands.
[00430] Banding can pre-emphasize power variation at higher frequencies, for the following reasons. If the entire coupling channel frequency range were one band, then P[ch][blk][wr_bnd] would be the arithmetic average of the power at each frequency in the coupling channel frequency range and the lower frequencies that typically has greater power would tend to reduce the value of P[ch][blk][wr_bnd] and therefore the value of log(P[ch][blk] [wr_bnd]). (In this case log(P[ch][blk][wr_bnd]) would have the same value as average log(P[ch][blk][wr_bnd]), because there would be only one band.) So the transient detection would be based largely on temporal variation at lower frequencies. Instead, dividing the coupling channel frequency band into, for example, a lower frequency band and an upper frequency band, and then averaging the power of the two bands in the log domain is equivalent to calculating the geometric mean of the power of the lower frequencies and the power of the upper frequencies. Such a geometric mean would be closer to the power of the upper frequencies than would be an arithmetic mean. Therefore, determining the band (banding), determining the log (power), and then averaging will tend to result in a quantity that is more sensitive to temporal variation at higher frequencies.
[00431] In this implementation, block 1158 involves determining an asymmetric power differential ("APD") based on the WLP. For example, the APD can be determined as follows: dWLP [ch][blk]

[00432] In Equation 16, dWLP[ch][blk] represents the differential weighted logarithmic power for a channel and block and WLP [ch][blk][blk-2] represents the weighted logarithmic power for the channel two blocks back. The example in Equation 16 is useful for processing audio data encoded using audio codecs such as E-AC-3 and AC-3, where there is a 50% overlap between consecutive blocks. Thus, the WLP of the current block is compared to the WLP two blocks ago. If there is no overlap between consecutive blocks, the WLP of the current block can be compared to the WLP of the previous block.
[00433] This example takes advantage of the possible temporal masking effect of previous blocks. Thus, if the WLP of the current block is greater than or equal to that of the previous block (in this example, the WLP two blocks before), the APD is set to the actual WLP differential. However, if the WLP of the current block is less than that of the previous block, the APD is set to half the actual WLP differential. Thus, ODA emphasizes increasing potency and de-emphasizing decreasing potency. In other deployments, a different fraction of the actual WLP differential might be used, for example % of the actual WLP differential.
[00434] Block 1160 may involve determining a raw transient measurement ("RTM") based on the APD. In this implementation, determining the raw transient measure involves computing a probability function of transient events based on an assumption that asymmetric temporal power differential is distributed according to a Gaussian distribution:

[00435] In Equation 17, RTM[ch][blk] represents a raw transient measurement for a channel and a block, and SAPD represents a tuning parameter. In this example, when SAPD is increased, a relatively larger power differential will be required to produce the same RTM value.
[00436] A transient control value, which may also be referred to in this document as a "transient measure", can be determined from the RTM in block 1162. In this example, the transient control value is determined according to Equation 18 :

[00437] In Equation 18, TM[ch][blk] represents the transient measurement for a channel and a block, TH represents an upper threshold and TL represents a lower threshold. Figure 11D provides an example of applying Equation 18 and how the TH and TL thresholds can be used. Other deployments may involve other types of linear or non-linear mapping from RTM to TM. According to some of these implementations, TM is a non-decreasing function of RTM.
[00438] Figure 11D is a graph illustrating an example of mapping raw transient values to transient control values. In the present context, both the raw transient values and the transient control values are situated in a range of 0.0 to 1.0, however, other implementations may involve other ranges of values. As shown in Equation 18 and Figure 11D, if a raw transient value is greater than or equal to the upper threshold TH, the transient control value is set relative to its maximum value, which is 1.0 in this example. In some deployments, a maximum transient control value may correspond to a definite transient event.
[00439] If a raw transient value is less than or equal to the lower threshold TL, the transient control value is defined in relation to its minimum value, which is 0.0 in this example. In some deployments, a minimal transient control value may correspond to a definite non-transient event.
[00440] However, if a raw transient value is within the range 1166 between the lower threshold TL and upper threshold TH, the transient control value can be scaled to an intermediate transient control value, which is between 0.0 and 1.0 in this example. The intermediate transient control value can correspond to a relative probability and/or a relative severity of a transient event.
[00441] Again referring to Figure 11C, in block 1164, an exponential decay function can be applied to the transient control value that is determined in block 1162. For example, the exponential decay function can cause the value of transient control decays slightly from an initial value to zero over a period of time. Submitting a transient control value to an exponential decay function can prevent artifacts associated with an abrupt switch. In some implementations, a transient control value from each current block can be calculated and compared to the exponential decayed version of the transient control value from the previous block. The final transient control value for the current block can be set to the maximum of the two transient control values.
[00442] Transient information, if received along with other audio data or determined by a decoder, can be used to control decorrelation processes. Transient information can include transient control values such as those described above. In some deployments, an amount of decorrelation for the audio data may be modified (eg, reduced), based at least partially on such transient information.
[00443] As described above, such decorrelation processes may involve applying a decorrelation filter to a portion of the audio data in order to produce filtered audio data, and mixing the filtered audio data with a portion of the audio data. audio received according to a mix ratio. Some deployments may involve controlling the 215 mixer based on transient information. For example, such deployments may involve modifying the mix ratio based at least partially on transient information. Such transient information may, for example, be included in the 645 mixer control information by the 1145 mixer transient control module. (see Figure 11B).
[00444] According to some implementations, transient control values can be used by the 215 mixer to modify alphas in order to suspend or reduce decorrelation during transient events. For example, alphas can be modified according to the following pseudocode: case (alpha[ch][bnd] >=0)alpha[ch][bnd] = alpha[ch][bnd] + (1-alpha[ch ][bnd]) * decorrelationDecayArray[ch];alsoalpha[ch][bnd] = alpha[ch][bnd] + (-1-alpha[ch][bnd])* decorrelationDecayArray[ch];
[00445] In the aforementioned pseudocode, alpha[ch][bnd] represents an alpha value of a frequency band for a channel. The term de-correlationDecayArray[ch] represents an exponential decay variable that takes a value in a range of 0 to 1. In some examples, alphas can be changed to +/-1 during transient events. The modification extent can be proportional to decorrelationDecayArray[ch], which reduces mix weights for the decorrelation signals to 0 and thus suspends or reduces the decorrelation. The exponential decay of decorrelationDecayArray[ch] slowly restores the normal decorrelation process.
[00446] In some implementations, the soft transient calculator 1130 can provide soft transient information to the spatial parameter module 665. Based at least partially on soft transient information, the spatial parameter module 665 can select a smoother for both to smooth the spatial parameters received in the bit stream or to smooth energy and other quantities involved in spatial parameter estimation.
[00447] Some deployments may involve controlling the decorrelation signal generator 218 according to transient information. For example, such deployments may involve modifying or temporarily stopping a decorrelation filter dithering process based, at least in part, on transient information. This can be advantageous due to the fact that dithering of full-pass filter poles during transient events can cause unwanted ripple artifacts. In some deployments, the maximum step value for dithering poles of a decorrelation filter can be modified based, at least partially, on transient information.
[00448] For example, the soft transient calculator 1130 can supply the decorrelation signal generator 625f control information to the decorrelation filter module 405 of the decorrelation signal generator 218 (see also Figure 4). Decorrelation filter module 405 can generate time-varying filters 1127 in response to control information from decorrelation signal generator 625f. According to some implementations, the 625f decorrelation signal generator control information may include information to control the maximum step value according to the maximum value of an exponential decay variable, such as:

[00449] For example, the maximum step value can be multiplied by the above expression when transient events are detected on any channel. The dithering process can be interrupted or delayed as a result.
[00450] In some deployments, a gain may be applied to filtered audio data based at least partially on transient information. For example, the power of filtered audio data can be matched to the power of direct audio data. In some implementations, such functionality may be provided by the signal level compressor module 1135 of Figure 11B.
[00451] Signal level compressor module 1135 can receive transient information, such as transient control values, from soft transient calculator 1130. Signal level compressor module 1135 can determine generator control information. decorrelation signal 625h according to transient control values. The 1135 signal level compressor module can supply the decorrelation signal generator 625h control information to the decorrelation signal generator 218. For example, the decorrelation signal generator 625h control information includes a value of gain that decorrelation signal generator 218 can apply to decorrelation signals 227 in order to maintain the power of the filtered audio data at a level that is less than or equal to the power of the direct audio data. The signal level compressor module 1135 can determine the decorrelation signal generator control information 625h by calculating, for each channel received in the coupling, the energy per frequency band in the frequency range of the coupling channel.
[00452] The signal level compressor module 1135 may include, for example, a bank of signal level compressors. In some of these deployments, the signal level compressors may include buffers to temporarily store energy per frequency band in the coupling channel frequency range determined by the signal level compressor module 1135. A fixed delay may be applied to the signal level compressors. filtered audio data, and the same delay can be applied to temporary stores.
[00453] The 1135 signal level compressor module can also determine information regarding the mixer and can provide the information regarding the mixer to the 1145 mixer transient control module. In some deployments, the signal level compressor module 1135 can provide information to control the mixer 215 to modify the mix ratio based on a gain to be applied to such filtered audio data. According to some of these implementations, the 1135 signal level compressor module can provide information to control the 215 mixer to suspend or reduce decorrelation during transient events. For example, the 1135 signal level compressor module can provide the following information regarding the mixer:TransCtrlFlag = max(decorrelationDecayArray[ch], 1-DecorrGain[ch][bnd]);case(alpha[ch][ bnd] >=0)alpha[ch][bnd] = alpha[ch][bnd] + (1-alpha[ch][bnd])* TransCtrlFlag;alsoalpha[ch][bnd] = alpha[ch][bnd ] + (-1-alpha[ch][bnd]) * TransCtrlFlag;
[00454] In the aforementioned pseudocode, TransCtrlFlag represents a transient control value and DecorrGain[ch][bnd] represents the gain to be applied to a band of a channel of filtered audio data.
[00455] In some deployments, a power estimation smoothing window for signal level compressors may be based, at least partially, on transient information. For example, a shorter smoothing window can be applied when a transient event is relatively more likely or when a relatively stronger transient event is detected. A larger smoothing window can be applied when a transient event is relatively less likely, when a relatively less intense transient event is detected when none in the transient event is detected. For example, the smoothing window length can be dynamically adjusted based on transient control values so that the window length is shorter when the flag value is close to a maximum value (e.g. 1.0) or longer when the flag value is close to a minimum value (for example, 0.0). Such deployments can help avoid time truncation during transient events, while also resulting in smooth gain factors during non-transient situations.
[00456] As noted above, in some deployments, transient information may be determined by an encoding device. Figure 11E is a flowchart that outlines a method for encoding transient information. At block 1172, audio data corresponding to a plurality of audio channels is received. In this example, audio data is received by an encoding device. In some deployments, audio data can be transformed from the time domain to the frequency domain (optional block 1174).
[00457] In block 1176, the audio characteristics, including transient information, are determined. For example, transient information can be determined as described above with reference to Figures 11A through 11D. For example, block 1176 may involve evaluating a temporal power variation in the audio data. Block 1176 may involve determining transient control values in accordance with the temporal power variation in the audio data. Such transient control values may indicate a definite transient event, a definite non-transient event, the probability of a transient event, and/or the severity of a transient event. Block 1176 may involve an application of an exponential decay function to transient control values.
[00458] In some implementations, the audio characteristics determined at block 1176 may include spatial parameters, which may be determined substantially as described in another paragraph in this document. However, instead of calculating correlations outside the coupling channel frequency range, spatial parameters can be determined by calculating correlations within the coupling channel frequency range. For example, alphas for an individual channel that will be encoded with coupling can be determined by calculating the correlations between transform coefficients of that channel and the coupling channel on a frequency band basis. In some deployments, the encoder can determine spatial parameters using complex frequency representations of the audio data.
[00459] Block 1178 involves coupling at least a two or more channel portion of the audio data to a coupled channel. For example, the frequency domain representations of the audio data for the coupled channel, which are within a coupling channel frequency range, may be combined in block 1178. In some implementations, more than one coupled channel may be formed. in block 1178.
[00460] At block 1180, encoded audio data frames are formed. In this example, the encoded audio data frames include data corresponding to the coupled channel(s) and encoded transient information determined at block 1176. For example, the encoded transient information may include one or more control flags. Control flags can include a channel block switching flag, an out-of-couple channel flag, and/or a coupling-in-use flag. Block 1180 may involve determining a combination of one or more of the control flags to form coded transient information that indicates a definite transient event, a definite non-transient event, the probability of a transient event, or the severity of a transient event. .
[00461] Whether or not formed by control flags, the coded transient information may include information to control a decorrelation process. For example, transient information may indicate that the decorrelation process should be temporarily stopped. Transient Information may indicate that an amount of decorrelation in a decorrelation process should be temporarily reduced. Transient information may indicate that a mix ratio of a decorrelation process should be modified.
[00462] Encoded audio data frames can also include various other types of audio data, including audio data for individual channels outside the coupling channel frequency range, audio data for uncoupled channels, etc. In some deployments, encoded audio data frames may also include spatial parameters, coupling coordinates, and/or other types of secondary information, such as those described elsewhere in this document.
[00463] Figure 12 is a block diagram that provides examples of components of an appliance that can be configured to implement aspects of the processes described in this document. Device 1200 can be a mobile phone, a smart phone, a desktop-type computer, a handheld or laptop computer, a netbook-type computer, a notebook-type computer, a smartbook-type computer, a tablet-type computer, a stereo system, a television, a DVD player, a digital recording device, or any of a variety of other devices. Device 1200 may include an encoding tool and/or a decoding tool. However, the components illustrated in Figure 12 are merely examples. A particular device may be configured to implement various modalities described in this document, but may or may not include all components. For example, some deployments may not include a speaker or microphone.
[00464] In this example, the device includes a system interface 1205. The system interface 1205 may include a network interface, such as a wireless network interface. Alternatively or additionally, the 1205 interface system may include a universal serial bus (USB) interface or other similar interface.
[00465] Device 1200 includes a logic system 1210. Logic system 1210 may include a processor, such as a general purpose single or multi-chip processor. Logic system 1210 may include a digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, or combinations thereof. Logic system 1210 can be configured to control the other components of device 1200. Although interfaces between device components 1200 are not shown in Figure 12, logic system 1210 can be configured to communicate with other components. The other components may or may not be configured to communicate with each other, as appropriate.
[00466] Logic system 1210 may be configured to perform various types of audio processing functionality, such as encoder and/or decoder functionality. Such encoder and/or decoder functionality may include, but are not limited to, the types of encoder and/or decoder functionality described herein. For example, the logic system 1210 can be configured to provide the functionality with respect to the decorrelator described in this document. In some of these deployments, the 1210 logic system can be configured to operate (at least partially) on software stored on one or more non-transient media. Non-transient media may include memory associated with the logical system 1210, such as random access memory (RAM) and/or read-only memory (ROM). The non-transient media may include a memory system memory 1215. The memory system 1215 may include one or more suitable types of non-transient storage media, such as flash memory, a hard disk drive, and so on.
[00467] For example, the logic system 1210 may be configured to receive frames of encoded audio data via the interface system 1205 and to decode the encoded audio data in accordance with the methods described herein. Alternatively or additionally, logic system 1210 can be configured to receive encoded audio data frames via an interface between memory system 1215 and logic system 1210. Logic system 1210 can be configured to control the ) 1220 speaker(s) according to decoded audio data. In some implementations, logic system 1210 may be configured to encode audio data in accordance with conventional encoding methods and/or in accordance with the encoding methods described herein. Logic system 1210 may be configured to receive such audio data via microphone 1225, via system interface 1205, etc.
[00468] Display system 1230 may include one or more suitable types of display, depending on the manifestation of device 1200. For example, display system 1230 may include a liquid crystal display, a plasma display, a bistable display, etc.
[00469] The 1235 user input system may include one or more devices sent to accept input from a user. In some implementations, the user input system 1235 may include a touch-sensitive screen that overlays a display of the display system 1230. The user input system 1235 may include buttons, a keyboard, switches, and the like. In some deployments, user input system 1235 may include microphone 1225: a user may voice commands to device 1200 via microphone 1225. The logic system may be configured for speech recognition and to control at least some operations of device 1200 in accordance with such voice commands.
[00470] The 1240 power system may include one or more energy storage devices, such as a nickel-cadmium battery or a lithium-ion battery. The 1240 power system can be configured to receive power from an electrical output.
[00471] Various modifications to the implantations described in the present disclosure may be readily apparent to persons of ordinary skill in the art. The general principles set out in this document can be applied to other deployments without departing from the spirit and scope of this disclosure. For example, while various deployments have been described in terms of Dolby Digital and Dolby Digital Plus, the methods described in this document can be deployed in combination with other audio codecs. As such, embodiments are not intended to be limited by the deployments shown in this document, but must conform to the broadest scope consistent with this disclosure, the principles, and the innovative features disclosed in this document.

权利要求:
Claims (15)
[0001]
1. Method characterized in that it comprises the steps of: receiving, from a stream of bits, audio data corresponding to a plurality of audio channels, the audio data comprising a corresponding frequency domain representation responding to filter bank coefficients of an audio coding system; and apply a decorrelation process to at least part of the audio data, the decorrelation process being performed with the same filter bank coefficients used by the audio coding system, where the decorrelation process involves applying a decorrelation algorithm which operates entirely on real value coefficients.
[0002]
2. Method according to claim 1, characterized in that the decorrelation process is performed without converting coefficients from the frequency domain representation into another frequency domain or time domain representation.
[0003]
3. Method, according to claim 1 or 2, characterized by the fact that the frequency domain representation is the result of the application of a critically sampled filter bank, with perfect reconstruction.
[0004]
4. Method according to claim 3, characterized in that the decorrelation process involves generating reverberation signals or decorrelation signals by applying linear filters to at least a portion of the frequency domain representation.
[0005]
5. Method according to any one of claims 1 to 4, characterized in that the frequency domain representation is a result of applying a modified discrete sine transform, a modified discrete cosine transform or an orthogonal transform superimposed on audio data in a time domain.
[0006]
6. Method according to any one of claims 1 to 5, characterized in that the decorrelation process involves selective or adaptive decorrelation by signal of specific channels and/or frequency bands.
[0007]
Method according to any one of claims 1 to 6, characterized in that the decorrelation process involves applying a decorrelation filter to a portion of the received audio data in order to produce filtered audio data.
[0008]
8. Method according to claim 7, characterized in that the decorrelation process involves using a non-hierarchical mixer to combine a direct portion of the received audio data with the audio data filtered according to spatial parameters.
[0009]
9. Method, according to any one of claims 1 to 8, characterized in that it further comprises receiving decorrelation information with the audio data, in which the decorrelation process involves decorrelating at least some of the audio data according to the decorrelation information received.
[0010]
10. Method according to claim 9, characterized in that the received decorrelation information includes at least one of the correlation coefficients between individual discrete channels and a coupling channel, correlation coefficients between individual discrete channels, explicit tonality or transient information.
[0011]
11. Method according to any one of claims 1 to 10, characterized in that it further comprises determining decorrelation information based on received audio data, wherein the decorrelation process involves decorrelating at least some of the audio data according to certain decorrelation information.
[0012]
12. Method according to claim 11, characterized in that it further comprises receiving decorrelation information encoded with the audio data, wherein the decorrelation process involves decorrelating at least some of the audio data in accordance with at least one of the decorrelation information received or among the decorrelation information determined.
[0013]
13. Method according to any one of claims 1 to 12, characterized in that the audio coding system is a legacy audio coding system, and optionally in which it further comprises receiving control mechanism elements in a stream of bits produced by the legacy audio coding system, where the decorrelation process is based, at least partially, on the control mechanism elements.
[0014]
14. Device characterized in that it comprises: an interface; and a logic system configured to perform all steps of the method as defined in any one of claims 1 to 13.
[0015]
15. Non-transient media characterized by the fact that it has a method stored therein, the method being to control an apparatus to perform the method as defined in any one of claims 1 to 13.

类似技术:

公开号 | 公开日 | 专利标题

BR112015018981B1|2022-02-01|Method, apparatus and non-transient media for signal decorrelation in an audio processing system

CN105900168B|2019-12-06|Audio signal enhancement using estimated spatial parameters

JP6046274B2|2016-12-14|Method for controlling inter-channel coherence of an up-mixed audio signal

US9830917B2|2017-11-28|Methods for audio signal transient detection and decorrelation control

BR112015019525B1|2021-12-14|METHOD, DEVICE AND NON-TRANSITORY MEDIA THAT HAS A METHOD STORED IN IT.

BR112015018522B1|2021-12-14|METHOD, DEVICE AND NON-TRANSITORY MEDIA WHICH HAS A METHOD STORED IN IT TO CONTROL COHERENCE BETWEEN AUDIO SIGNAL CHANNELS WITH UPMIX.

US20150371646A1|2015-12-24|Time-Varying Filters for Generating Decorrelation Signals

同族专利:

公开号 | 公开日

ES2613478T3|2017-05-24|

WO2014126682A1|2014-08-21|

KR102114648B1|2020-05-26|

US9830916B2|2017-11-28|

RU2015133287A|2017-02-21|

CN104995676A|2015-10-21|

EP2956933A1|2015-12-23|

JP6038355B2|2016-12-07|

TW201443877A|2014-11-16|

US20150380000A1|2015-12-31|

HK1213686A1|2016-07-08|

JP2016510433A|2016-04-07|

TWI618050B|2018-03-11|

KR20150106949A|2015-09-22|

BR112015018981A2|2017-07-18|

RU2614381C2|2017-03-24|

EP2956933B1|2016-11-16|

CN104995676B|2018-03-30|

IN2015MN01954A|2015-08-28|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB8308843D0|1983-03-30|1983-05-11|Clark A P|Apparatus for adjusting receivers of data transmission channels|

US5077798A|1988-09-28|1991-12-31|Hitachi, Ltd.|Method and system for voice coding based on vector quantization|

KR20010006291A|1998-02-13|2001-01-26|요트.게.아. 롤페즈|Surround sound reproduction system, sound/visual reproduction system, surround signal processing unit and method for processing an input surround signal|

US6175631B1|1999-07-09|2001-01-16|Stephen A. Davis|Method and apparatus for decorrelating audio signals|

US7218665B2|2003-04-25|2007-05-15|Bae Systems Information And Electronic Systems Integration Inc.|Deferred decorrelating decision-feedback detector for supersaturated communications|

SE0301273D0|2003-04-30|2003-04-30|Coding Technologies Sweden Ab|Advanced processing based on a complex exponential-modulated filter bank and adaptive time signaling methods|

US20090299756A1|2004-03-01|2009-12-03|Dolby Laboratories Licensing Corporation|Ratio of speech to non-speech audio such as for elderly or hearing-impaired listeners|

DE602005022641D1|2004-03-01|2010-09-09|Dolby Lab Licensing Corp|Multi-channel audio decoding|

WO2005098825A1|2004-04-05|2005-10-20|Koninklijke Philips Electronics N.V.|Stereo coding and decoding methods and apparatuses thereof|

SE0400998D0|2004-04-16|2004-04-16|Cooding Technologies Sweden Ab|Method for representing multi-channel audio signals|

US8793125B2|2004-07-14|2014-07-29|Koninklijke Philips Electronics N.V.|Method and device for decorrelation and upmixing of audio channels|

CN101040322A|2004-10-15|2007-09-19|皇家飞利浦电子股份有限公司|A system and a method of processing audio data, a program element, and a computer-readable medium|

SE0402649D0|2004-11-02|2004-11-02|Coding Tech Ab|Advanced methods of creating orthogonal signals|

US7787631B2|2004-11-30|2010-08-31|Agere Systems Inc.|Parametric coding of spatial audio with cues based on transmitted channels|

EP1691348A1|2005-02-14|2006-08-16|Ecole Polytechnique Federale De Lausanne|Parametric joint-coding of audio sources|

US7961890B2|2005-04-15|2011-06-14|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung, E.V.|Multi-channel hierarchical audio coding with compact side information|

KR101251426B1|2005-06-03|2013-04-05|돌비 레버러토리즈 라이쎈싱 코오포레이션|Apparatus and method for encoding audio signals with decoding instructions|

EP2088580B1|2005-07-14|2011-09-07|Koninklijke Philips Electronics N.V.|Audio decoding|

EP1906706B1|2005-07-15|2009-11-25|Panasonic Corporation|Audio decoder|

RU2383942C2|2005-08-30|2010-03-10|ЭлДжи ЭЛЕКТРОНИКС ИНК.|Method and device for audio signal decoding|

AU2006285538B2|2005-08-30|2011-03-24|Lg Electronics Inc.|Apparatus for encoding and decoding audio signal and method thereof|

US7974713B2|2005-10-12|2011-07-05|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Temporal and spatial shaping of multi-channel audio signals|

US7536299B2|2005-12-19|2009-05-19|Dolby Laboratories Licensing Corporation|Correlating and decorrelating transforms for multiple description coding systems|

JP2007178684A|2005-12-27|2007-07-12|Matsushita Electric Ind Co Ltd|Multi-channel audio decoding device|

EP1979897B1|2006-01-19|2013-08-21|LG Electronics Inc.|Method and apparatus for processing a media signal|

TW200742275A|2006-03-21|2007-11-01|Dolby Lab Licensing Corp|Low bit rate audio encoding and decoding in which multiple channels are represented by fewer channels and auxiliary information|

RU2393646C1|2006-03-28|2010-06-27|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Improved method for signal generation in restoration of multichannel audio|

EP1845699B1|2006-04-13|2009-11-11|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio signal decorrelator|

US8379868B2|2006-05-17|2013-02-19|Creative Technology Ltd|Spatial audio coding based on universal spatial cues|

EP1883067A1|2006-07-24|2008-01-30|Deutsche Thomson-Brandt Gmbh|Method and apparatus for lossless encoding of a source signal, using a lossy encoded data stream and a lossless extension data stream|

JP5513887B2|2006-09-14|2014-06-04|コーニンクレッカフィリップスエヌヴェ|Sweet spot operation for multi-channel signals|

RU2406165C2|2007-02-14|2010-12-10|ЭлДжи ЭЛЕКТРОНИКС ИНК.|Methods and devices for coding and decoding object-based audio signals|

DE102007018032B4|2007-04-17|2010-11-11|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Generation of decorrelated signals|

US8015368B2|2007-04-20|2011-09-06|Siport, Inc.|Processor extensions for accelerating spectral band replication|

CN101809654B|2007-04-26|2013-08-07|杜比国际公司|Apparatus and method for synthesizing an output signal|

AT493731T|2007-06-08|2011-01-15|Dolby Lab Licensing Corp|HYBRID DISCHARGE OF SURROUND SOUND AUDIO CHANNELS THROUGH CONTROLLABLE COMBINATION OF AMBIENT AND MATRIX DECODED SIGNAL COMPONENTS|

US8046214B2|2007-06-22|2011-10-25|Microsoft Corporation|Low complexity decoder for complex transform coding of multi-channel sound|

US8064624B2|2007-07-19|2011-11-22|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Method and apparatus for generating a stereo signal with enhanced perceptual quality|

US8374883B2|2007-10-31|2013-02-12|Panasonic Corporation|Encoder and decoder using inter channel prediction based on optimally determined signals|

US9373339B2|2008-05-12|2016-06-21|Broadcom Corporation|Speech intelligibility enhancement system and method|

EP2144229A1|2008-07-11|2010-01-13|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Efficient use of phase information in audio encoding and decoding|

US20100040243A1|2008-08-14|2010-02-18|Johnston James D|Sound Field Widening and Phase Decorrelation System and Method|

JP5326465B2|2008-09-26|2013-10-30|富士通株式会社|Audio decoding method, apparatus, and program|

EP2214162A1|2009-01-28|2010-08-04|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Upmixer, method and computer program for upmixing a downmix audio signal|

EP2214165A3|2009-01-30|2010-09-15|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method and computer program for manipulating an audio signal comprising a transient event|

ES2374486T3|2009-03-26|2012-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|DEVICE AND METHOD FOR HANDLING AN AUDIO SIGNAL.|

US8497467B2|2009-04-13|2013-07-30|Telcordia Technologies, Inc.|Optical filter control|

MX2011013829A|2009-06-24|2012-03-07|Fraunhofer Ges Forschung|Audio signal decoder, method for decoding an audio signal and computer program using cascaded audio object processing stages.|

GB2465047B|2009-09-03|2010-09-22|Peter Graham Craven|Prediction of signals|

MY161012A|2009-12-07|2017-03-31|Dolby Laboratories Licensing Corp|Decoding of multichannel aufio encoded bit streams using adaptive hybrid transformation|

EP2360681A1|2010-01-15|2011-08-24|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information|

JP5299327B2|2010-03-17|2013-09-25|ソニー株式会社|Audio processing apparatus, audio processing method, and program|

CA3105050C|2010-04-09|2021-08-31|Dolby International Ab|Audio upmixer operable in prediction or non-prediction mode|

EP2375409A1|2010-04-09|2011-10-12|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder, audio decoder and related methods for processing multi-channel audio signals using complex prediction|

WO2012026741A2|2010-08-24|2012-03-01|엘지전자 주식회사|Method and device for processing audio signals|

PT2609591T|2010-08-25|2016-07-12|Fraunhofer-Gesellschaft Zur Förderung Der Angewandten Forschung E V|Apparatus for generating a decorrelated signal using transmitted phase information|

US8908874B2|2010-09-08|2014-12-09|Dts, Inc.|Spatial audio encoding and reproduction|

EP2477188A1|2011-01-18|2012-07-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoding and decoding of slot positions of events in an audio signal frame|

CN102903368B|2011-07-29|2017-04-12|杜比实验室特许公司|Method and equipment for separating convoluted blind sources|

CN103718466B|2011-08-04|2016-08-17|杜比国际公司|By using parametric stereo to improve FM stereo radio electricity receptor|

US8527264B2|2012-01-09|2013-09-03|Dolby Laboratories Licensing Corporation|Method and system for encoding audio data with adaptive low frequency compensation|

ES2549953T3|2012-08-27|2015-11-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for the reproduction of an audio signal, apparatus and method for the generation of an encoded audio signal, computer program and encoded audio signal|

JP6046274B2|2013-02-14|2016-12-14|ドルビーラボラトリーズライセンシングコーポレイション|Method for controlling inter-channel coherence of an up-mixed audio signal|

WO2014126688A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Methods for audio signal transient detection and decorrelation control|

WO2015153872A1|2014-04-02|2015-10-08|Kla-Tencor Corporation|A method, system and computer program product for generating high density registration maps for masks|

EP3067887A1|2015-03-09|2016-09-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal|

EP3179744B1|2015-12-08|2018-01-31|Axis AB|Method, device and system for controlling a sound image in an audio zone|

CN105702263B|2016-01-06|2019-08-30|清华大学|Speech playback detection method and device|

CN105931648B|2016-06-24|2019-05-03|百度在线网络技术（北京）有限公司|Audio signal solution reverberation method and device|

CN107895580B|2016-09-30|2021-06-01|华为技术有限公司|Audio signal reconstruction method and device|

CN112397076A|2016-11-23|2021-02-23|瑞典爱立信有限公司|Method and apparatus for adaptively controlling decorrelating filters|

US10019981B1|2017-06-02|2018-07-10|Apple Inc.|Active reverberation augmentation|

EP3573058B1|2018-05-23|2021-02-24|Harman Becker Automotive Systems GmbH|Dry sound and ambient sound separation|

CN111107024B|2018-10-25|2022-01-28|航天科工惯性技术有限公司|Error-proof decoding method for time and frequency mixed coding|

CN109557509B|2018-11-23|2020-08-11|安徽四创电子股份有限公司|Double-pulse signal synthesizer for improving inter-pulse interference|

CN110267064B|2019-06-12|2021-11-12|百度在线网络技术（北京）有限公司|Audio playing state processing method, device, equipment and storage medium|

CN110740404B|2019-09-27|2020-12-25|广州励丰文化科技股份有限公司|Audio correlation processing method and audio processing device|

CN110740416B|2019-09-27|2021-04-06|广州励丰文化科技股份有限公司|Audio signal processing method and device|

法律状态:
2018-11-13| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-09-10| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-11-30| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2022-02-01| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 22/01/2014, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201361764837P| true| 2013-02-14|2013-02-14|

US61/764,837|2013-02-14|

PCT/US2014/012453|WO2014126682A1|2013-02-14|2014-01-22|Signal decorrelation in an audio processing system|

[返回顶部]